tuxji commented on code in PR #1098:
URL: https://github.com/apache/daffodil/pull/1098#discussion_r1367305017


##########
README.md:
##########
@@ -59,32 +59,42 @@ Compile source code:
 
     sbt compile
 
-### Tests
+### Test
 
-Run unit tests:
+Check all unit tests pass:
 
     sbt test
 
-Run slower integration tests:
+Check all integration tests pass:
 
     sbt daffodil-test-integration/test
 
-### Command Line Interface
+### Format
 
-Build the command line interface (Linux and Windows shell scripts in
-`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
-Interface] documentation for details on their usage):
+Check format of source and sbt files:
+
+    sbt scalafmtCheckAll scalafmtSbtCheck
+
+Reformat source and sbt files if necessary:
+
+    sbt scalafmtAll scalafmtSbt

Review Comment:
   Yes, we can write a sbt task which runs clang-format on the C files and 
errors if it finds and reports any diffs.  However, some people may get an 
error running the task (especially if the task's purpose is to run all the 
language formatters from a single command) since clang-format is not commonly 
installed on many (most?) machines. 



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_main.c:
##########
@@ -21,8 +21,8 @@
 #include <string.h>           // for strcmp
 #include "cli_errors.h"       // for CLI_DIAGNOSTICS, CLI_FILE_CLOSE, 
CLI_FILE_OPEN
 #include "daffodil_getopt.h"  // for daffodil_cli, daffodil_parse, 
daffodil_parse_cli, parse_daffodil_cli, daffodil_unparse, daffodil_unparse_cli, 
DAFFODIL_PARSE, DAFFODIL_UNPARSE
-#include "errors.h"           // for continue_or_exit, print_diagnostics, 
Error, Diagnostics, Error::(anonymous)
-#include "infoset.h"          // for get_infoset, walk_infoset, PState, 
UState, parse_data, unparse_infoset, InfosetBase, VisitEventHandler
+#include "errors.h"           // for continue_or_exit, Diagnostics, 
print_diagnostics, Error
+#include "infoset.h"          // for ParserOrUnparserState, PState, UState, 
get_infoset, walk_infoset, parse_data, unparse_infoset, InfosetBase, 
VisitEventHandler

Review Comment:
   My viewpoint (coming back from developing Java to developing C) is that it 
would be nice if C header lines tell you which symbols they import just like 
Java import lines tell you which symbols they import.  However, I can tell iwyu 
not to create these comments if you all don't like them.  Then the remaining 
use case for iwyu is to sort and canonicalize C header lines just like 
Java/Scala tools sort and canonicalize Java/Scala import lines.  



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_main.c:
##########
@@ -107,17 +105,24 @@ main(int argc, char *argv[])
         output = fopen_or_exit(output, daffodil_unparse.outfile, "w");
 
         // Initialize our infoset's values from the XML data
-        const bool   CLEAR_INFOSET = true;
+        const bool CLEAR_INFOSET = true;
         InfosetBase *infoset = get_infoset(CLEAR_INFOSET);
-        XMLReader    xmlReader = {xmlReaderMethods, input, NULL, NULL};
+        XMLReader xmlReader = {xmlReaderMethods, input, NULL, NULL};
         error = walk_infoset((VisitEventHandler *)&xmlReader, infoset);
         continue_or_exit(error);
 
         // Unparse our infoset to the output file
-        UState ustate = {output, 0, NULL, NULL, 0, 0};
+        UState ustate = {{output, 0, NULL, NULL}, 0, 0};
         unparse_infoset(infoset, &ustate);
-        print_diagnostics(ustate.diagnostics);
-        continue_or_exit(ustate.error);
+        print_diagnostics(ustate.pu.diagnostics);
+        continue_or_exit(ustate.pu.error);
+
+        // Any diagnostics will fail the unparse if validate mode is on
+        if (daffodil_parse.validate && ustate.pu.diagnostics)

Review Comment:
   You're right on both points.  It would be even easier to merge 
`daffodil_parse_cli` and `daffodil_unparse_cli` into `daffodil_pu_cli` than it 
would be to add `validate` to `daffodil_unparse_cli`.  
   
   I'll wait for Mike and you to respond to the rest of my PR comments before I 
do another pass on the PR, so please weigh in on how much you want the other 
suggested changes.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)

Review Comment:
   No, I don't think `pu` is too short and I think your point about consistency 
is good.  Will rename.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocument.scala:
##########
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc: 
XMLSchemaDocument)
   final override lazy val optLexicalParent = Some(xmlSDoc)
   final override lazy val optXMLSchemaDocument = Some(xmlSDoc)
 
+  final lazy val version = (xml \ "@version").text

Review Comment:
   How do you change the line to return an Option[String]?  



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -50,6 +51,53 @@ trait BinaryValueCodeGenerator {
     if (e.hasFixedValue && e.fixedValueAsString.nonEmpty) {
       validateFixed(deref)
     }
+
+    // Check if the element's value is restricted to a set of enumerations
+    if (e.typeDef.optRestriction.exists(_.hasEnumeration)) {
+      // Split the enumeration values into an array of strings ready to be 
inserted into the C code
+      val enumerationValues = 
e.typeDef.optRestriction.flatMap(_.enumerationValues).get
+      val enums = enumerationValues.split('|').map(_.replaceAll("\\\\(\\W)", 
"$1"))

Review Comment:
   I'll try that expression, thanks.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)
+{
+    if (count < minOccurs || count > maxOccurs)
+    {
+        // Array count is not within bounds, so report error
+        static Error error = {ERR_ARRAY_BOUNDS, {0}};
+        error.arg.s = name;
+        pustate->error = &error;
+    }
+}
+
+// Validate element's value is same as its fixed attribute
+
+void
+validate_fixed_attribute(bool same, const char *element, ParserOrUnparserState 
*pustate)
+{
+    if (!same)
+    {
+        // Element is not same as its fixed attribute, so diagnose problem
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_FIXED_VALUE, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+// Validate element's value matches an enumeration (either floating point or 
integer)

Review Comment:
   Yes, the comment was meant to apply to the next 3 functions (although I 
added `validate_hexBinary_enumeration` later without updating the comment) 
instead of only the next function.  I've given each function its own (clearer) 
comment now.



##########
README.md:
##########
@@ -59,32 +59,42 @@ Compile source code:
 
     sbt compile
 
-### Tests
+### Test
 
-Run unit tests:
+Check all unit tests pass:
 
     sbt test
 
-Run slower integration tests:
+Check all integration tests pass:
 
     sbt daffodil-test-integration/test
 
-### Command Line Interface
+### Format
 
-Build the command line interface (Linux and Windows shell scripts in
-`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
-Interface] documentation for details on their usage):
+Check format of source and sbt files:
+
+    sbt scalafmtCheckAll scalafmtSbtCheck
+
+Reformat source and sbt files if necessary:
+
+    sbt scalafmtAll scalafmtSbt
+
+### Build

Review Comment:
   Yes, I'll document the publish commands too and explain Ivy is also used by 
sbt to build schema-only projects.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_getopt.c:
##########
@@ -85,6 +85,12 @@ parse_daffodil_cli(int argc, char *argv[])
             daffodil_parse.outfile = optarg;
             daffodil_unparse.outfile = optarg;
             break;
+        case 'r':
+            // Ignore "-r root" option/optarg
+            break;
+        case 's':
+            // Ignore "-s schema" option/optarg
+            break;

Review Comment:
   A workflow I sometimes use with code generators, DFDL schemas, and TDML 
tests is to run `daffodil generate` on the DFDL schemas, run a pair of 
`daffodil parse` and `c/daffodil parse` commands on the same data files, and 
check that they both produce the same exit status and similar infosets.  This 
manual workflow is finer-grained than and sometimes easier to debug than 
running `daffodil test` on TDML tests using the same data and infosets.  I 
thought it could be nice if I could run the same `c/daffodil parse` command as 
the `daffodil parse` command by appending only `c/` to `daffodil` without 
having to change any of the options.
   
   I thought I could hide this behavior from users (the usage message doesn't 
mention the -r and -s options) while making life easier for developers and 
testers who know about this hidden behavior for greater compatibility between 
`daffodil` and `c/daffodil`.  I can revert this source file if you think users 
would be surprised and confused that `c/daffodil` allows -r and -s options.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
     static const ErrorLookup table[ERR_ZZZ] = {

Review Comment:
   Yes, `ERR__SIZE` and `_ERR_SIZE` sort last.  I agree a name like 
`ERR__NUM_CODES` would be more meaningful than `ERR_ZZZ` while still sorting 
last.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
     static const ErrorLookup table[ERR_ZZZ] = {
         {ERR_ARRAY_BOUNDS, "%s count out of bounds\n", FIELD_S},
         {ERR_CHOICE_KEY, "no match between choice dispatch key %" PRId64 " and 
any branch key\n", FIELD_D64},
+        {ERR_ENUM_MATCH, "value of element '%s' does not match any of its 
enumerations\n", FIELD_S},
         {ERR_FIXED_VALUE, "value of element '%s' does not match value of its 
'fixed' attribute\n", FIELD_S},
         {ERR_HEXBINARY_ALLOC, "error allocating hexBinary memory -- %" PRId64 
" bytes\n", FIELD_D64},
         {ERR_LEFTOVER_DATA, "Left over data, at least %i bit(s) remaining 
after end of parse\n", FIELD_C},
+        {ERR_OUTSIDE_RANGE, "value of element '%s' is outside its allowed 
range\n", FIELD_S},

Review Comment:
   Fair point, although I'll rename to `ERR_RESTR_ENUM`, `ERR_RESTR_FIXED`, and 
`ERR_RESTR_RANGE` instead to keep these symbols shorter.  Note that pattern, 
length, digits, etc., apply only to text data, right?



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)
+{
+    if (count < minOccurs || count > maxOccurs)
+    {
+        // Array count is not within bounds, so report error
+        static Error error = {ERR_ARRAY_BOUNDS, {0}};
+        error.arg.s = name;
+        pustate->error = &error;
+    }

Review Comment:
   The `validate_array_bounds` function is performing defensive programming 
against a buffer attack.  The size of the buffer for a variable length array is 
statically defined at schema compilation time and the C code cannot increase 
the buffer's size at runtime if the variable length array says its size is 
larger than the schema allowed for.  Setting `pu->error` ensures that the parse 
will terminate immediately without reading any more data from the stream.
   
   The other functions such as `validate_fixed_attribute` inspect data that is 
well-formed but may or may not be valid according to restrictions on that 
schema element.  We can afford to accumulate (and print later) as many 
diagnostic warnings as we can fit into the diagnostics buffer.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)
+{
+    if (count < minOccurs || count > maxOccurs)
+    {
+        // Array count is not within bounds, so report error
+        static Error error = {ERR_ARRAY_BOUNDS, {0}};
+        error.arg.s = name;
+        pustate->error = &error;
+    }
+}
+
+// Validate element's value is same as its fixed attribute
+
+void
+validate_fixed_attribute(bool same, const char *element, ParserOrUnparserState 
*pustate)
+{
+    if (!same)
+    {
+        // Element is not same as its fixed attribute, so diagnose problem
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_FIXED_VALUE, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+// Validate element's value matches an enumeration (either floating point or 
integer)
+
+void
+validate_floatpt_enumeration(double number, size_t num_enums, double enums[], 
const char *element,
+                             ParserOrUnparserState *pustate)
+{
+    bool match_found = false;
+    for (size_t i = 0; !match_found && i < num_enums; i++)
+    {
+        if (number == enums[i])
+        {
+            match_found = true;
+        }
+    }
+    if (!match_found)
+    {
+        // Number does not match any enumeration, so diagnose problem
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+void
+validate_hexbinary_enumeration(const HexBinary *hexBinary, size_t num_enums, 
HexBinary enums[],
+                               const char *element, ParserOrUnparserState 
*pustate)
+{
+    bool match_found = false;
+    for (size_t i = 0; !match_found && i < num_enums; i++)
+    {
+        bool same_lengths = hexBinary->lengthInBytes == enums[i].lengthInBytes;
+        if (same_lengths && memcmp(hexBinary->array, enums[i].array, 
hexBinary->lengthInBytes) == 0)
+        {
+            match_found = true;
+        }
+    }
+    if (!match_found)
+    {
+        // HexBinary does not match any enumeration value, so report error
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+void
+validate_integer_enumeration(int64_t number, size_t num_enums, int64_t 
enums[], const char *element,
+                             ParserOrUnparserState *pustate)
+{
+    bool match_found = false;
+    for (size_t i = 0; !match_found && i < num_enums; i++)
+    {
+        if (number == enums[i])
+        {
+            match_found = true;
+        }
+    }
+    if (!match_found)
+    {
+        // Number does not match any enumeration value, so report error
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+// Validate element's value fits within its schema facets' range
+
+void
+validate_schema_range(bool within_range, const char *element, 
ParserOrUnparserState *pustate)
+{
+    if (!within_range)

Review Comment:
   Correct, comment added to clarify.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
 
     val initERDStatement = ""
     val parseStatement =
-      s"""$indent1$indent2    parse_validate_fixed($field == $fixed, 
"$localName", pstate);
-         |$indent1$indent2    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &pstate->pu);
+         |$indent1$indent2    if (pstate->pu.error) return;""".stripMargin
+    val unparseStatement =
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &ustate->pu);
+         |$indent1$indent2    if (ustate->pu.error) return;""".stripMargin
+    cgState.addSimpleTypeStatements(initERDStatement, parseStatement, 
unparseStatement)
+  }
+
+  // Generate C code to validate a primitive element matches one of its 
enumeration values.
+  // May be replaced by a more specialized function in another trait for some 
types.
+  private def valueValidateEnumeration(
+    e: ElementBase,
+    deref: String,
+    enums: Array[String],
+    cgState: CodeGeneratorState,
+  ): Unit = {
+    val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+    val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+    val localName = cgState.cName(e)
+    val field = s"instance->$localName$deref"
+
+    // Extra initialization only for hexBinary enumerations
+    val arraysName = s"arrays_$localName"
+    val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+      s"{$arraysName[$index], ${s.length / 2}, false}"
+    }
+    val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", 
", "}"))
+    val enumsLenMax = enums.map(_.length / 2).max

Review Comment:
   First change is easier, so will accept that change.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/DaffodilCCodeGenerator.scala:
##########
@@ -259,6 +260,7 @@ object DaffodilCCodeGenerator
   def generateCode(gram: Gram, cgState: CodeGeneratorState): Unit = {
     gram match {
       case g: AlignmentFill => alignmentFillGenerateCode(g, cgState)
+      case g: AssertBooleanPrim => noop(g)

Review Comment:
   We have a schema with assertions which we want daffodil to evaluate for the 
additional validation they perform.  They aren't meant to cause any 
backtracking, only print diagnostics (their failureType is "recoverableError"). 
 At the same time, we want daffodilC to ignore these assertions silently since 
we would need to support more of the DFDL expression language than the 
generator currently supports.
   
   Yes, it could be a good idea to add code to print a warning if the assertion 
does not have failureType="recoverableError", which would mean the schema might 
be using the assertion for backtracking.  That will have to wait for another 
time since it's non-trivial to find the right symbols to inspect.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.h:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef VALIDATORS_H
+#define VALIDATORS_H
+
+// clang-format off
+#include <stdbool.h>  // for bool
+#include <stddef.h>   // for size_t
+#include <stdint.h>   // for int64_t
+#include "infoset.h"  // for ParserOrUnparserState
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+extern void validate_array_bounds(const char *name, size_t count, size_t 
minOccurs, size_t maxOccurs,
+                                  ParserOrUnparserState *pustate);
+
+// Validate element's value is same as its fixed attribute
+
+extern void validate_fixed_attribute(bool same, const char *element, 
ParserOrUnparserState *pustate);
+
+// Validate element's value matches an enumeration (either floating point or 
integer)

Review Comment:
   Yes, the comment was talking about all 3 functions and now I've given each 
function its own comment.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
 
     val initERDStatement = ""
     val parseStatement =
-      s"""$indent1$indent2    parse_validate_fixed($field == $fixed, 
"$localName", pstate);
-         |$indent1$indent2    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &pstate->pu);
+         |$indent1$indent2    if (pstate->pu.error) return;""".stripMargin
+    val unparseStatement =
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &ustate->pu);
+         |$indent1$indent2    if (ustate->pu.error) return;""".stripMargin
+    cgState.addSimpleTypeStatements(initERDStatement, parseStatement, 
unparseStatement)
+  }
+
+  // Generate C code to validate a primitive element matches one of its 
enumeration values.
+  // May be replaced by a more specialized function in another trait for some 
types.
+  private def valueValidateEnumeration(
+    e: ElementBase,
+    deref: String,
+    enums: Array[String],
+    cgState: CodeGeneratorState,
+  ): Unit = {
+    val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+    val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+    val localName = cgState.cName(e)
+    val field = s"instance->$localName$deref"
+
+    // Extra initialization only for hexBinary enumerations
+    val arraysName = s"arrays_$localName"
+    val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+      s"{$arraysName[$index], ${s.length / 2}, false}"
+    }
+    val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", 
", "}"))
+    val enumsLenMax = enums.map(_.length / 2).max
+    val arraysInit =
+      s"""$indent1$indent2    uint8_t $arraysName[][$enumsLenMax] = 
${hexEnumsInit.mkString(
+          "{",
+          ", ",
+          "}",
+        )};\n"""

Review Comment:
   I have not seen any schemas with hex binary enums yet, so let's put off any 
changes until we see hex enums actually used.  I only found out through 
experimentation that daffodil supports hex enums and thought daffodilC should 
try to support them too.  I think your imagination that most hex enums are 
close in length is probably right anyway.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
 
     val initERDStatement = ""
     val parseStatement =
-      s"""$indent1$indent2    parse_validate_fixed($field == $fixed, 
"$localName", pstate);
-         |$indent1$indent2    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &pstate->pu);
+         |$indent1$indent2    if (pstate->pu.error) return;""".stripMargin
+    val unparseStatement =
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &ustate->pu);
+         |$indent1$indent2    if (ustate->pu.error) return;""".stripMargin
+    cgState.addSimpleTypeStatements(initERDStatement, parseStatement, 
unparseStatement)
+  }
+
+  // Generate C code to validate a primitive element matches one of its 
enumeration values.
+  // May be replaced by a more specialized function in another trait for some 
types.
+  private def valueValidateEnumeration(
+    e: ElementBase,
+    deref: String,
+    enums: Array[String],
+    cgState: CodeGeneratorState,
+  ): Unit = {
+    val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+    val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+    val localName = cgState.cName(e)
+    val field = s"instance->$localName$deref"
+
+    // Extra initialization only for hexBinary enumerations
+    val arraysName = s"arrays_$localName"
+    val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+      s"{$arraysName[$index], ${s.length / 2}, false}"
+    }
+    val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", 
", "}"))
+    val enumsLenMax = enums.map(_.length / 2).max
+    val arraysInit =
+      s"""$indent1$indent2    uint8_t $arraysName[][$enumsLenMax] = 
${hexEnumsInit.mkString(
+          "{",
+          ", ",
+          "}",
+        )};\n"""
+
+    val (enumsInit, extraInit, primType, valType, fieldArg) = 
e.optPrimType.get match {
+      case PrimType.Double | PrimType.Float => (enums, "", "double", 
"floatpt", field)
+      case PrimType.HexBinary => (hexEnums, arraysInit, "HexBinary", 
"hexbinary", s"&$field")
+      case _ => (enums, "", "int64_t", "integer", field)

Review Comment:
   I don't think I ever checked dates, times, or arbitrary length integers 
longer than 64 bits. I am sure they would fail somewhere else too since the 
code generator doesn't support them at all (it won't be able to find the right 
primitive C type, for example).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to