Re: [PR] Add schema validation code to C generator [daffodil]

via GitHub Fri, 20 Oct 2023 07:12:53 -0700


stevedlawrence commented on code in PR #1098:
URL: https://github.com/apache/daffodil/pull/1098#discussion_r1366881382



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_getopt.c:
##########
@@ -85,6 +85,12 @@ parse_daffodil_cli(int argc, char *argv[])
             daffodil_parse.outfile = optarg;
             daffodil_unparse.outfile = optarg;
             break;
+        case 'r':
+            // Ignore "-r root" option/optarg
+            break;
+        case 's':
+            // Ignore "-s schema" option/optarg
+            break;

Review Comment:
   I'm not sure I understand this, why of options to just ignore them? Seems 
that could be very confusing.



##########
README.md:
##########
@@ -59,32 +59,42 @@ Compile source code:
 
     sbt compile
 
-### Tests
+### Test
 
-Run unit tests:
+Check all unit tests pass:
 
     sbt test
 
-Run slower integration tests:
+Check all integration tests pass:
 
     sbt daffodil-test-integration/test
 
-### Command Line Interface
+### Format
 
-Build the command line interface (Linux and Windows shell scripts in
-`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
-Interface] documentation for details on their usage):
+Check format of source and sbt files:
+
+    sbt scalafmtCheckAll scalafmtSbtCheck
+
+Reformat source and sbt files if necessary:
+
+    sbt scalafmtAll scalafmtSbt

Review Comment:
   Is it worth adding commands for what you use to format h C code?
   
   Maybe for a future PR, it might be worth adding an sbt task that executes 
those commands with the right options (e.g. `sbt cfmtAll`) so we don't have to 
remember them. Could also be useful to add a step in our github actions lint 
job to run that and error if git diff reports and changes.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_main.c:
##########
@@ -21,8 +21,8 @@
 #include <string.h>           // for strcmp
 #include "cli_errors.h"       // for CLI_DIAGNOSTICS, CLI_FILE_CLOSE, 
CLI_FILE_OPEN
 #include "daffodil_getopt.h"  // for daffodil_cli, daffodil_parse, 
daffodil_parse_cli, parse_daffodil_cli, daffodil_unparse, daffodil_unparse_cli, 
DAFFODIL_PARSE, DAFFODIL_UNPARSE
-#include "errors.h"           // for continue_or_exit, print_diagnostics, 
Error, Diagnostics, Error::(anonymous)
-#include "infoset.h"          // for get_infoset, walk_infoset, PState, 
UState, parse_data, unparse_infoset, InfosetBase, VisitEventHandler
+#include "errors.h"           // for continue_or_exit, Diagnostics, 
print_diagnostics, Error
+#include "infoset.h"          // for ParserOrUnparserState, PState, UState, 
get_infoset, walk_infoset, parse_data, unparse_infoset, InfosetBase, 
VisitEventHandler

Review Comment:
   Just curious, do you find it useful to include the functions each header 
includes? Seems like it's just noise to me. I guess this is auto-generated so 
not a big deal, but personally I think it would be nice if this could be 
disabled. Especially since some of these lines get pretty long.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
     static const ErrorLookup table[ERR_ZZZ] = {
         {ERR_ARRAY_BOUNDS, "%s count out of bounds\n", FIELD_S},
         {ERR_CHOICE_KEY, "no match between choice dispatch key %" PRId64 " and 
any branch key\n", FIELD_D64},
+        {ERR_ENUM_MATCH, "value of element '%s' does not match any of its 
enumerations\n", FIELD_S},
         {ERR_FIXED_VALUE, "value of element '%s' does not match value of its 
'fixed' attribute\n", FIELD_S},
         {ERR_HEXBINARY_ALLOC, "error allocating hexBinary memory -- %" PRId64 
" bytes\n", FIELD_D64},
         {ERR_LEFTOVER_DATA, "Left over data, at least %i bit(s) remaining 
after end of parse\n", FIELD_C},
+        {ERR_OUTSIDE_RANGE, "value of element '%s' is outside its allowed 
range\n", FIELD_S},

Review Comment:
   Just a minor naming suggestion, what about something like 
`ERR_RESTRICTION_ENUM`, `ERR_RESTRICTION_RANGE` so that restriction errors are 
grouped together? Might be more useful if the other checks are ever added, e.g. 
pattern, lengths, digits, etc.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)
+{
+    if (count < minOccurs || count > maxOccurs)
+    {
+        // Array count is not within bounds, so report error
+        static Error error = {ERR_ARRAY_BOUNDS, {0}};
+        error.arg.s = name;
+        pustate->error = &error;
+    }
+}
+
+// Validate element's value is same as its fixed attribute
+
+void
+validate_fixed_attribute(bool same, const char *element, ParserOrUnparserState 
*pustate)
+{
+    if (!same)
+    {
+        // Element is not same as its fixed attribute, so diagnose problem
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_FIXED_VALUE, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+// Validate element's value matches an enumeration (either floating point or 
integer)

Review Comment:
   Looking at the code, this is used only for double and float primitives, it's 
not used for integer types as the comment suggests. And I guess that could be 
possible for <= 32-bit ints which can be exactly represented by doubles, I 
thought you might be doing that at first.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/DaffodilCCodeGenerator.scala:
##########
@@ -259,6 +260,7 @@ object DaffodilCCodeGenerator
   def generateCode(gram: Gram, cgState: CodeGeneratorState): Unit = {
     gram match {
       case g: AlignmentFill => alignmentFillGenerateCode(g, cgState)
+      case g: AssertBooleanPrim => noop(g)

Review Comment:
   Is there value in having a `noop()` variant that creates a warning? Some 
schemas might rely on assertions for backtracking to work correctly, so getting 
a warning that says something like "this schema uses something that is not 
supported by the Code generator" might be useful to avoid unexpected issue. 
Some grammars probably wouldn't need a warning (e.g. CaptureContentLength.*), 
but some like these might be more useful?



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_getopt.h:
##########
@@ -42,7 +42,7 @@ extern struct daffodil_parse_cli
     const char *infoset_converter;
     const char *infile;
     const char *outfile;
-    bool        validate;
+    bool validate;

Review Comment:
   I like this change :+1: . I've never been a huge fan of aligning variable 
names in C.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
 
     val initERDStatement = ""
     val parseStatement =
-      s"""$indent1$indent2    parse_validate_fixed($field == $fixed, 
"$localName", pstate);
-         |$indent1$indent2    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &pstate->pu);
+         |$indent1$indent2    if (pstate->pu.error) return;""".stripMargin
+    val unparseStatement =
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &ustate->pu);
+         |$indent1$indent2    if (ustate->pu.error) return;""".stripMargin
+    cgState.addSimpleTypeStatements(initERDStatement, parseStatement, 
unparseStatement)
+  }
+
+  // Generate C code to validate a primitive element matches one of its 
enumeration values.
+  // May be replaced by a more specialized function in another trait for some 
types.
+  private def valueValidateEnumeration(
+    e: ElementBase,
+    deref: String,
+    enums: Array[String],
+    cgState: CodeGeneratorState,
+  ): Unit = {
+    val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+    val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+    val localName = cgState.cName(e)
+    val field = s"instance->$localName$deref"
+
+    // Extra initialization only for hexBinary enumerations
+    val arraysName = s"arrays_$localName"
+    val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+      s"{$arraysName[$index], ${s.length / 2}, false}"
+    }
+    val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", 
", "}"))
+    val enumsLenMax = enums.map(_.length / 2).max
+    val arraysInit =
+      s"""$indent1$indent2    uint8_t $arraysName[][$enumsLenMax] = 
${hexEnumsInit.mkString(
+          "{",
+          ", ",
+          "}",
+        )};\n"""
+
+    val (enumsInit, extraInit, primType, valType, fieldArg) = 
e.optPrimType.get match {
+      case PrimType.Double | PrimType.Float => (enums, "", "double", 
"floatpt", field)
+      case PrimType.HexBinary => (hexEnums, arraysInit, "HexBinary", 
"hexbinary", s"&$field")
+      case _ => (enums, "", "int64_t", "integer", field)

Review Comment:
   Is there value in listing the integer types? I imagine this failing on 
things like Date/Time/Integer primitives. THough I guess those primitives would 
fail somewhere much earlier since they aren't supported?



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
     static const ErrorLookup table[ERR_ZZZ] = {

Review Comment:
   This `ERR_ZZZ` confused me at first, but I guess it's used so that it sorts 
last in the enum array? Does something like `ERR__SIZE` (so two underscores) or 
`_ERR_SIZE` (underscore prefix) sort last? Maybe an extra underscore is too 
subtle? If a different name is agreed, doesn't have to be part of this PR since 
it's unrelated, just something I noticed and found confusing.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)
+{
+    if (count < minOccurs || count > maxOccurs)
+    {
+        // Array count is not within bounds, so report error
+        static Error error = {ERR_ARRAY_BOUNDS, {0}};
+        error.arg.s = name;
+        pustate->error = &error;
+    }
+}
+
+// Validate element's value is same as its fixed attribute
+
+void
+validate_fixed_attribute(bool same, const char *element, ParserOrUnparserState 
*pustate)
+{
+    if (!same)
+    {
+        // Element is not same as its fixed attribute, so diagnose problem
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_FIXED_VALUE, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+// Validate element's value matches an enumeration (either floating point or 
integer)
+
+void
+validate_floatpt_enumeration(double number, size_t num_enums, double enums[], 
const char *element,
+                             ParserOrUnparserState *pustate)
+{
+    bool match_found = false;
+    for (size_t i = 0; !match_found && i < num_enums; i++)
+    {
+        if (number == enums[i])
+        {
+            match_found = true;
+        }
+    }
+    if (!match_found)
+    {
+        // Number does not match any enumeration, so diagnose problem
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+void
+validate_hexbinary_enumeration(const HexBinary *hexBinary, size_t num_enums, 
HexBinary enums[],
+                               const char *element, ParserOrUnparserState 
*pustate)
+{
+    bool match_found = false;
+    for (size_t i = 0; !match_found && i < num_enums; i++)
+    {
+        bool same_lengths = hexBinary->lengthInBytes == enums[i].lengthInBytes;
+        if (same_lengths && memcmp(hexBinary->array, enums[i].array, 
hexBinary->lengthInBytes) == 0)
+        {
+            match_found = true;
+        }
+    }
+    if (!match_found)
+    {
+        // HexBinary does not match any enumeration value, so report error
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+void
+validate_integer_enumeration(int64_t number, size_t num_enums, int64_t 
enums[], const char *element,
+                             ParserOrUnparserState *pustate)
+{
+    bool match_found = false;
+    for (size_t i = 0; !match_found && i < num_enums; i++)
+    {
+        if (number == enums[i])
+        {
+            match_found = true;
+        }
+    }
+    if (!match_found)
+    {
+        // Number does not match any enumeration value, so report error
+        Diagnostics *diagnostics = get_diagnostics();
+        const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+        add_diagnostic(diagnostics, &error);
+        pustate->diagnostics = diagnostics;
+    }
+}
+
+// Validate element's value fits within its schema facets' range
+
+void
+validate_schema_range(bool within_range, const char *element, 
ParserOrUnparserState *pustate)
+{
+    if (!within_range)

Review Comment:
   I was expecting this to be similar to the above functions and accept 
something like an array of ranges as a parameter and found this confusing. But 
I now see that the generator creates a boolean expression that determines if a 
field is with range and passes the result of that expression here. That makes 
perfect sense in hindsight, but it might be useful to add a comment explaining 
this, since on first glance it feels odd for this function to not do the actual 
checking and is ultimately just a thing that adds a diagnostic. 



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -50,6 +51,53 @@ trait BinaryValueCodeGenerator {
     if (e.hasFixedValue && e.fixedValueAsString.nonEmpty) {
       validateFixed(deref)
     }
+
+    // Check if the element's value is restricted to a set of enumerations
+    if (e.typeDef.optRestriction.exists(_.hasEnumeration)) {
+      // Split the enumeration values into an array of strings ready to be 
inserted into the C code
+      val enumerationValues = 
e.typeDef.optRestriction.flatMap(_.enumerationValues).get
+      val enums = enumerationValues.split('|').map(_.replaceAll("\\\\(\\W)", 
"$1"))

Review Comment:
   I think maybe Daffodil generates a regex to check enumerations, so you're 
having undo that regex? Instead, I think you can access the raw enum values 
like this:
   ```scala
   val enums = e.typeDef.optRestriction.get.enumerations.map(_.enumValueRaw)
   ```



##########
README.md:
##########
@@ -59,32 +59,42 @@ Compile source code:
 
     sbt compile
 
-### Tests
+### Test
 
-Run unit tests:
+Check all unit tests pass:
 
     sbt test
 
-Run slower integration tests:
+Check all integration tests pass:
 
     sbt daffodil-test-integration/test
 
-### Command Line Interface
+### Format
 
-Build the command line interface (Linux and Windows shell scripts in
-`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
-Interface] documentation for details on their usage):
+Check format of source and sbt files:
+
+    sbt scalafmtCheckAll scalafmtSbtCheck
+
+Reformat source and sbt files if necessary:
+
+    sbt scalafmtAll scalafmtSbt
+
+### Build

Review Comment:
   I wonder if we need another build section for building and locally 
publishing the jars? Anyone developing something integrating Daffodil may want 
that. Maybe something like this?
   
   ```
   Build the Daffodil jars and publish to the a Maven or Ivy repository
   
   Maven:
   
       sbt publishM2
   
   Ivy:
   
       sbt publishLocal
   ```



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)
+{
+    if (count < minOccurs || count > maxOccurs)
+    {
+        // Array count is not within bounds, so report error
+        static Error error = {ERR_ARRAY_BOUNDS, {0}};
+        error.arg.s = name;
+        pustate->error = &error;
+    }

Review Comment:
   Why does this set `pustate->error` but `validated_fixed_attribute` below 
adds a diagnostic?



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_main.c:
##########
@@ -107,17 +105,24 @@ main(int argc, char *argv[])
         output = fopen_or_exit(output, daffodil_unparse.outfile, "w");
 
         // Initialize our infoset's values from the XML data
-        const bool   CLEAR_INFOSET = true;
+        const bool CLEAR_INFOSET = true;
         InfosetBase *infoset = get_infoset(CLEAR_INFOSET);
-        XMLReader    xmlReader = {xmlReaderMethods, input, NULL, NULL};
+        XMLReader xmlReader = {xmlReaderMethods, input, NULL, NULL};
         error = walk_infoset((VisitEventHandler *)&xmlReader, infoset);
         continue_or_exit(error);
 
         // Unparse our infoset to the output file
-        UState ustate = {output, 0, NULL, NULL, 0, 0};
+        UState ustate = {{output, 0, NULL, NULL}, 0, 0};
         unparse_infoset(infoset, &ustate);
-        print_diagnostics(ustate.diagnostics);
-        continue_or_exit(ustate.error);
+        print_diagnostics(ustate.pu.diagnostics);
+        continue_or_exit(ustate.pu.error);
+
+        // Any diagnostics will fail the unparse if validate mode is on
+        if (daffodil_parse.validate && ustate.pu.diagnostics)

Review Comment:
   Should this be `daffodil_unparse.validate`? Or since getopt doesn't really 
support subcommand specific options, maybe just combine the 
`daffodil_parse_cli` and `daffodil_unparse_cli` struct into a single struct?



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)

Review Comment:
   Minor, thoughts on naming this just `pu`? It would be consistent with the 
`pu` name inside PState/UState so it's more clear its the same thing not 
something different kind of state? Also makes it more easily differentiated 
from a  `pstate` name if someone glosses over the `u`? Maybe `pu` is too short 
for a variable name though?



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
 
     val initERDStatement = ""
     val parseStatement =
-      s"""$indent1$indent2    parse_validate_fixed($field == $fixed, 
"$localName", pstate);
-         |$indent1$indent2    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &pstate->pu);
+         |$indent1$indent2    if (pstate->pu.error) return;""".stripMargin
+    val unparseStatement =
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &ustate->pu);
+         |$indent1$indent2    if (ustate->pu.error) return;""".stripMargin
+    cgState.addSimpleTypeStatements(initERDStatement, parseStatement, 
unparseStatement)
+  }
+
+  // Generate C code to validate a primitive element matches one of its 
enumeration values.
+  // May be replaced by a more specialized function in another trait for some 
types.
+  private def valueValidateEnumeration(
+    e: ElementBase,
+    deref: String,
+    enums: Array[String],
+    cgState: CodeGeneratorState,
+  ): Unit = {
+    val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+    val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+    val localName = cgState.cName(e)
+    val field = s"instance->$localName$deref"
+
+    // Extra initialization only for hexBinary enumerations
+    val arraysName = s"arrays_$localName"
+    val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+      s"{$arraysName[$index], ${s.length / 2}, false}"
+    }
+    val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", 
", "}"))
+    val enumsLenMax = enums.map(_.length / 2).max

Review Comment:
   Suggest `hexEnumsLenMax` since this is specific to hex? Or move all the hex 
specific stuff to the HexBinary cases so we don't have to calculate this stuff 
where it doesn't apply.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.h:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef VALIDATORS_H
+#define VALIDATORS_H
+
+// clang-format off
+#include <stdbool.h>  // for bool
+#include <stddef.h>   // for size_t
+#include <stdint.h>   // for int64_t
+#include "infoset.h"  // for ParserOrUnparserState
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+extern void validate_array_bounds(const char *name, size_t count, size_t 
minOccurs, size_t maxOccurs,
+                                  ParserOrUnparserState *pustate);
+
+// Validate element's value is same as its fixed attribute
+
+extern void validate_fixed_attribute(bool same, const char *element, 
ParserOrUnparserState *pustate);
+
+// Validate element's value matches an enumeration (either floating point or 
integer)

Review Comment:
   Is this comment talking about all three below functions? It doesn't mention 
hexbinary. Should it?



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocument.scala:
##########
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc: 
XMLSchemaDocument)
   final override lazy val optLexicalParent = Some(xmlSDoc)
   final override lazy val optXMLSchemaDocument = Some(xmlSDoc)
 
+  final lazy val version = (xml \ "@version").text

Review Comment:
   I didn't know XSD had a version attribute. Coudld be useful!
   
   Though, I wonder if this should be an Option so we can tell if it's provided 
or not? In the code gen case, it can just to `optVersion.getOrElse("")` so it 
wouldn't change much.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
 
     val initERDStatement = ""
     val parseStatement =
-      s"""$indent1$indent2    parse_validate_fixed($field == $fixed, 
"$localName", pstate);
-         |$indent1$indent2    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &pstate->pu);
+         |$indent1$indent2    if (pstate->pu.error) return;""".stripMargin
+    val unparseStatement =
+      s"""$indent1$indent2    validate_fixed_attribute($field == $fixed, 
"$localName", &ustate->pu);
+         |$indent1$indent2    if (ustate->pu.error) return;""".stripMargin
+    cgState.addSimpleTypeStatements(initERDStatement, parseStatement, 
unparseStatement)
+  }
+
+  // Generate C code to validate a primitive element matches one of its 
enumeration values.
+  // May be replaced by a more specialized function in another trait for some 
types.
+  private def valueValidateEnumeration(
+    e: ElementBase,
+    deref: String,
+    enums: Array[String],
+    cgState: CodeGeneratorState,
+  ): Unit = {
+    val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+    val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+    val localName = cgState.cName(e)
+    val field = s"instance->$localName$deref"
+
+    // Extra initialization only for hexBinary enumerations
+    val arraysName = s"arrays_$localName"
+    val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+      s"{$arraysName[$index], ${s.length / 2}, false}"
+    }
+    val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", 
", "}"))
+    val enumsLenMax = enums.map(_.length / 2).max
+    val arraysInit =
+      s"""$indent1$indent2    uint8_t $arraysName[][$enumsLenMax] = 
${hexEnumsInit.mkString(
+          "{",
+          ", ",
+          "}",
+        )};\n"""

Review Comment:
   Are there any concerns that there might be a bunch of hex binary enums with 
one being much larger than the other, and so this array would have a lot of 
empty space? I guess the other option would be to create a bunch of separate 
variables (e.g. hexEnum1, hexEnum2, hexEnum3, ... 0), maybe that's not worth 
it. Or alternatively, concat the hex arrays into a single long array, and then 
then hexEnums have to correctly index into that array? Maybe that's too much 
complexity for a case that isn't that likely? I imagine most hex enums are all 
the same length, or at least close in length.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add schema validation code to C generator [daffodil]

Reply via email to