stevedlawrence commented on code in PR #1098:
URL: https://github.com/apache/daffodil/pull/1098#discussion_r1366881382
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_getopt.c:
##########
@@ -85,6 +85,12 @@ parse_daffodil_cli(int argc, char *argv[])
daffodil_parse.outfile = optarg;
daffodil_unparse.outfile = optarg;
break;
+ case 'r':
+ // Ignore "-r root" option/optarg
+ break;
+ case 's':
+ // Ignore "-s schema" option/optarg
+ break;
Review Comment:
I'm not sure I understand this, why of options to just ignore them? Seems
that could be very confusing.
##########
README.md:
##########
@@ -59,32 +59,42 @@ Compile source code:
sbt compile
-### Tests
+### Test
-Run unit tests:
+Check all unit tests pass:
sbt test
-Run slower integration tests:
+Check all integration tests pass:
sbt daffodil-test-integration/test
-### Command Line Interface
+### Format
-Build the command line interface (Linux and Windows shell scripts in
-`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
-Interface] documentation for details on their usage):
+Check format of source and sbt files:
+
+ sbt scalafmtCheckAll scalafmtSbtCheck
+
+Reformat source and sbt files if necessary:
+
+ sbt scalafmtAll scalafmtSbt
Review Comment:
Is it worth adding commands for what you use to format h C code?
Maybe for a future PR, it might be worth adding an sbt task that executes
those commands with the right options (e.g. `sbt cfmtAll`) so we don't have to
remember them. Could also be useful to add a step in our github actions lint
job to run that and error if git diff reports and changes.
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_main.c:
##########
@@ -21,8 +21,8 @@
#include <string.h> // for strcmp
#include "cli_errors.h" // for CLI_DIAGNOSTICS, CLI_FILE_CLOSE,
CLI_FILE_OPEN
#include "daffodil_getopt.h" // for daffodil_cli, daffodil_parse,
daffodil_parse_cli, parse_daffodil_cli, daffodil_unparse, daffodil_unparse_cli,
DAFFODIL_PARSE, DAFFODIL_UNPARSE
-#include "errors.h" // for continue_or_exit, print_diagnostics,
Error, Diagnostics, Error::(anonymous)
-#include "infoset.h" // for get_infoset, walk_infoset, PState,
UState, parse_data, unparse_infoset, InfosetBase, VisitEventHandler
+#include "errors.h" // for continue_or_exit, Diagnostics,
print_diagnostics, Error
+#include "infoset.h" // for ParserOrUnparserState, PState, UState,
get_infoset, walk_infoset, parse_data, unparse_infoset, InfosetBase,
VisitEventHandler
Review Comment:
Just curious, do you find it useful to include the functions each header
includes? Seems like it's just noise to me. I guess this is auto-generated so
not a big deal, but personally I think it would be nice if this could be
disabled. Especially since some of these lines get pretty long.
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
static const ErrorLookup table[ERR_ZZZ] = {
{ERR_ARRAY_BOUNDS, "%s count out of bounds\n", FIELD_S},
{ERR_CHOICE_KEY, "no match between choice dispatch key %" PRId64 " and
any branch key\n", FIELD_D64},
+ {ERR_ENUM_MATCH, "value of element '%s' does not match any of its
enumerations\n", FIELD_S},
{ERR_FIXED_VALUE, "value of element '%s' does not match value of its
'fixed' attribute\n", FIELD_S},
{ERR_HEXBINARY_ALLOC, "error allocating hexBinary memory -- %" PRId64
" bytes\n", FIELD_D64},
{ERR_LEFTOVER_DATA, "Left over data, at least %i bit(s) remaining
after end of parse\n", FIELD_C},
+ {ERR_OUTSIDE_RANGE, "value of element '%s' is outside its allowed
range\n", FIELD_S},
Review Comment:
Just a minor naming suggestion, what about something like
`ERR_RESTRICTION_ENUM`, `ERR_RESTRICTION_RANGE` so that restriction errors are
grouped together? Might be more useful if the other checks are ever added, e.g.
pattern, lengths, digits, etc.
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h> // for bool, false, true
+#include <string.h> // for memcmp
+#include "errors.h" // for add_diagnostic, get_diagnostics, Error,
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE,
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t
maxOccurs,
+ ParserOrUnparserState *pustate)
+{
+ if (count < minOccurs || count > maxOccurs)
+ {
+ // Array count is not within bounds, so report error
+ static Error error = {ERR_ARRAY_BOUNDS, {0}};
+ error.arg.s = name;
+ pustate->error = &error;
+ }
+}
+
+// Validate element's value is same as its fixed attribute
+
+void
+validate_fixed_attribute(bool same, const char *element, ParserOrUnparserState
*pustate)
+{
+ if (!same)
+ {
+ // Element is not same as its fixed attribute, so diagnose problem
+ Diagnostics *diagnostics = get_diagnostics();
+ const Error error = {ERR_FIXED_VALUE, {.s = element}};
+
+ add_diagnostic(diagnostics, &error);
+ pustate->diagnostics = diagnostics;
+ }
+}
+
+// Validate element's value matches an enumeration (either floating point or
integer)
Review Comment:
Looking at the code, this is used only for double and float primitives, it's
not used for integer types as the comment suggests. And I guess that could be
possible for <= 32-bit ints which can be exactly represented by doubles, I
thought you might be doing that at first.
##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/DaffodilCCodeGenerator.scala:
##########
@@ -259,6 +260,7 @@ object DaffodilCCodeGenerator
def generateCode(gram: Gram, cgState: CodeGeneratorState): Unit = {
gram match {
case g: AlignmentFill => alignmentFillGenerateCode(g, cgState)
+ case g: AssertBooleanPrim => noop(g)
Review Comment:
Is there value in having a `noop()` variant that creates a warning? Some
schemas might rely on assertions for backtracking to work correctly, so getting
a warning that says something like "this schema uses something that is not
supported by the Code generator" might be useful to avoid unexpected issue.
Some grammars probably wouldn't need a warning (e.g. CaptureContentLength.*),
but some like these might be more useful?
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_getopt.h:
##########
@@ -42,7 +42,7 @@ extern struct daffodil_parse_cli
const char *infoset_converter;
const char *infile;
const char *outfile;
- bool validate;
+ bool validate;
Review Comment:
I like this change :+1: . I've never been a huge fan of aligning variable
names in C.
##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
val initERDStatement = ""
val parseStatement =
- s"""$indent1$indent2 parse_validate_fixed($field == $fixed,
"$localName", pstate);
- |$indent1$indent2 if (pstate->error) return;""".stripMargin
+ s"""$indent1$indent2 validate_fixed_attribute($field == $fixed,
"$localName", &pstate->pu);
+ |$indent1$indent2 if (pstate->pu.error) return;""".stripMargin
+ val unparseStatement =
+ s"""$indent1$indent2 validate_fixed_attribute($field == $fixed,
"$localName", &ustate->pu);
+ |$indent1$indent2 if (ustate->pu.error) return;""".stripMargin
+ cgState.addSimpleTypeStatements(initERDStatement, parseStatement,
unparseStatement)
+ }
+
+ // Generate C code to validate a primitive element matches one of its
enumeration values.
+ // May be replaced by a more specialized function in another trait for some
types.
+ private def valueValidateEnumeration(
+ e: ElementBase,
+ deref: String,
+ enums: Array[String],
+ cgState: CodeGeneratorState,
+ ): Unit = {
+ val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+ val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+ val localName = cgState.cName(e)
+ val field = s"instance->$localName$deref"
+
+ // Extra initialization only for hexBinary enumerations
+ val arraysName = s"arrays_$localName"
+ val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+ s"{$arraysName[$index], ${s.length / 2}, false}"
+ }
+ val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ",
", "}"))
+ val enumsLenMax = enums.map(_.length / 2).max
+ val arraysInit =
+ s"""$indent1$indent2 uint8_t $arraysName[][$enumsLenMax] =
${hexEnumsInit.mkString(
+ "{",
+ ", ",
+ "}",
+ )};\n"""
+
+ val (enumsInit, extraInit, primType, valType, fieldArg) =
e.optPrimType.get match {
+ case PrimType.Double | PrimType.Float => (enums, "", "double",
"floatpt", field)
+ case PrimType.HexBinary => (hexEnums, arraysInit, "HexBinary",
"hexbinary", s"&$field")
+ case _ => (enums, "", "int64_t", "integer", field)
Review Comment:
Is there value in listing the integer types? I imagine this failing on
things like Date/Time/Integer primitives. THough I guess those primitives would
fail somewhere much earlier since they aren't supported?
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
static const ErrorLookup table[ERR_ZZZ] = {
Review Comment:
This `ERR_ZZZ` confused me at first, but I guess it's used so that it sorts
last in the enum array? Does something like `ERR__SIZE` (so two underscores) or
`_ERR_SIZE` (underscore prefix) sort last? Maybe an extra underscore is too
subtle? If a different name is agreed, doesn't have to be part of this PR since
it's unrelated, just something I noticed and found confusing.
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h> // for bool, false, true
+#include <string.h> // for memcmp
+#include "errors.h" // for add_diagnostic, get_diagnostics, Error,
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE,
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t
maxOccurs,
+ ParserOrUnparserState *pustate)
+{
+ if (count < minOccurs || count > maxOccurs)
+ {
+ // Array count is not within bounds, so report error
+ static Error error = {ERR_ARRAY_BOUNDS, {0}};
+ error.arg.s = name;
+ pustate->error = &error;
+ }
+}
+
+// Validate element's value is same as its fixed attribute
+
+void
+validate_fixed_attribute(bool same, const char *element, ParserOrUnparserState
*pustate)
+{
+ if (!same)
+ {
+ // Element is not same as its fixed attribute, so diagnose problem
+ Diagnostics *diagnostics = get_diagnostics();
+ const Error error = {ERR_FIXED_VALUE, {.s = element}};
+
+ add_diagnostic(diagnostics, &error);
+ pustate->diagnostics = diagnostics;
+ }
+}
+
+// Validate element's value matches an enumeration (either floating point or
integer)
+
+void
+validate_floatpt_enumeration(double number, size_t num_enums, double enums[],
const char *element,
+ ParserOrUnparserState *pustate)
+{
+ bool match_found = false;
+ for (size_t i = 0; !match_found && i < num_enums; i++)
+ {
+ if (number == enums[i])
+ {
+ match_found = true;
+ }
+ }
+ if (!match_found)
+ {
+ // Number does not match any enumeration, so diagnose problem
+ Diagnostics *diagnostics = get_diagnostics();
+ const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+ add_diagnostic(diagnostics, &error);
+ pustate->diagnostics = diagnostics;
+ }
+}
+
+void
+validate_hexbinary_enumeration(const HexBinary *hexBinary, size_t num_enums,
HexBinary enums[],
+ const char *element, ParserOrUnparserState
*pustate)
+{
+ bool match_found = false;
+ for (size_t i = 0; !match_found && i < num_enums; i++)
+ {
+ bool same_lengths = hexBinary->lengthInBytes == enums[i].lengthInBytes;
+ if (same_lengths && memcmp(hexBinary->array, enums[i].array,
hexBinary->lengthInBytes) == 0)
+ {
+ match_found = true;
+ }
+ }
+ if (!match_found)
+ {
+ // HexBinary does not match any enumeration value, so report error
+ Diagnostics *diagnostics = get_diagnostics();
+ const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+ add_diagnostic(diagnostics, &error);
+ pustate->diagnostics = diagnostics;
+ }
+}
+
+void
+validate_integer_enumeration(int64_t number, size_t num_enums, int64_t
enums[], const char *element,
+ ParserOrUnparserState *pustate)
+{
+ bool match_found = false;
+ for (size_t i = 0; !match_found && i < num_enums; i++)
+ {
+ if (number == enums[i])
+ {
+ match_found = true;
+ }
+ }
+ if (!match_found)
+ {
+ // Number does not match any enumeration value, so report error
+ Diagnostics *diagnostics = get_diagnostics();
+ const Error error = {ERR_ENUM_MATCH, {.s = element}};
+
+ add_diagnostic(diagnostics, &error);
+ pustate->diagnostics = diagnostics;
+ }
+}
+
+// Validate element's value fits within its schema facets' range
+
+void
+validate_schema_range(bool within_range, const char *element,
ParserOrUnparserState *pustate)
+{
+ if (!within_range)
Review Comment:
I was expecting this to be similar to the above functions and accept
something like an array of ranges as a parameter and found this confusing. But
I now see that the generator creates a boolean expression that determines if a
field is with range and passes the result of that expression here. That makes
perfect sense in hindsight, but it might be useful to add a comment explaining
this, since on first glance it feels odd for this function to not do the actual
checking and is ultimately just a thing that adds a diagnostic.
##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -50,6 +51,53 @@ trait BinaryValueCodeGenerator {
if (e.hasFixedValue && e.fixedValueAsString.nonEmpty) {
validateFixed(deref)
}
+
+ // Check if the element's value is restricted to a set of enumerations
+ if (e.typeDef.optRestriction.exists(_.hasEnumeration)) {
+ // Split the enumeration values into an array of strings ready to be
inserted into the C code
+ val enumerationValues =
e.typeDef.optRestriction.flatMap(_.enumerationValues).get
+ val enums = enumerationValues.split('|').map(_.replaceAll("\\\\(\\W)",
"$1"))
Review Comment:
I think maybe Daffodil generates a regex to check enumerations, so you're
having undo that regex? Instead, I think you can access the raw enum values
like this:
```scala
val enums = e.typeDef.optRestriction.get.enumerations.map(_.enumValueRaw)
```
##########
README.md:
##########
@@ -59,32 +59,42 @@ Compile source code:
sbt compile
-### Tests
+### Test
-Run unit tests:
+Check all unit tests pass:
sbt test
-Run slower integration tests:
+Check all integration tests pass:
sbt daffodil-test-integration/test
-### Command Line Interface
+### Format
-Build the command line interface (Linux and Windows shell scripts in
-`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
-Interface] documentation for details on their usage):
+Check format of source and sbt files:
+
+ sbt scalafmtCheckAll scalafmtSbtCheck
+
+Reformat source and sbt files if necessary:
+
+ sbt scalafmtAll scalafmtSbt
+
+### Build
Review Comment:
I wonder if we need another build section for building and locally
publishing the jars? Anyone developing something integrating Daffodil may want
that. Maybe something like this?
```
Build the Daffodil jars and publish to the a Maven or Ivy repository
Maven:
sbt publishM2
Ivy:
sbt publishLocal
```
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h> // for bool, false, true
+#include <string.h> // for memcmp
+#include "errors.h" // for add_diagnostic, get_diagnostics, Error,
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE,
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t
maxOccurs,
+ ParserOrUnparserState *pustate)
+{
+ if (count < minOccurs || count > maxOccurs)
+ {
+ // Array count is not within bounds, so report error
+ static Error error = {ERR_ARRAY_BOUNDS, {0}};
+ error.arg.s = name;
+ pustate->error = &error;
+ }
Review Comment:
Why does this set `pustate->error` but `validated_fixed_attribute` below
adds a diagnostic?
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_main.c:
##########
@@ -107,17 +105,24 @@ main(int argc, char *argv[])
output = fopen_or_exit(output, daffodil_unparse.outfile, "w");
// Initialize our infoset's values from the XML data
- const bool CLEAR_INFOSET = true;
+ const bool CLEAR_INFOSET = true;
InfosetBase *infoset = get_infoset(CLEAR_INFOSET);
- XMLReader xmlReader = {xmlReaderMethods, input, NULL, NULL};
+ XMLReader xmlReader = {xmlReaderMethods, input, NULL, NULL};
error = walk_infoset((VisitEventHandler *)&xmlReader, infoset);
continue_or_exit(error);
// Unparse our infoset to the output file
- UState ustate = {output, 0, NULL, NULL, 0, 0};
+ UState ustate = {{output, 0, NULL, NULL}, 0, 0};
unparse_infoset(infoset, &ustate);
- print_diagnostics(ustate.diagnostics);
- continue_or_exit(ustate.error);
+ print_diagnostics(ustate.pu.diagnostics);
+ continue_or_exit(ustate.pu.error);
+
+ // Any diagnostics will fail the unparse if validate mode is on
+ if (daffodil_parse.validate && ustate.pu.diagnostics)
Review Comment:
Should this be `daffodil_unparse.validate`? Or since getopt doesn't really
support subcommand specific options, maybe just combine the
`daffodil_parse_cli` and `daffodil_unparse_cli` struct into a single struct?
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h> // for bool, false, true
+#include <string.h> // for memcmp
+#include "errors.h" // for add_diagnostic, get_diagnostics, Error,
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE,
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t
maxOccurs,
+ ParserOrUnparserState *pustate)
Review Comment:
Minor, thoughts on naming this just `pu`? It would be consistent with the
`pu` name inside PState/UState so it's more clear its the same thing not
something different kind of state? Also makes it more easily differentiated
from a `pstate` name if someone glosses over the `u`? Maybe `pu` is too short
for a variable name though?
##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
val initERDStatement = ""
val parseStatement =
- s"""$indent1$indent2 parse_validate_fixed($field == $fixed,
"$localName", pstate);
- |$indent1$indent2 if (pstate->error) return;""".stripMargin
+ s"""$indent1$indent2 validate_fixed_attribute($field == $fixed,
"$localName", &pstate->pu);
+ |$indent1$indent2 if (pstate->pu.error) return;""".stripMargin
+ val unparseStatement =
+ s"""$indent1$indent2 validate_fixed_attribute($field == $fixed,
"$localName", &ustate->pu);
+ |$indent1$indent2 if (ustate->pu.error) return;""".stripMargin
+ cgState.addSimpleTypeStatements(initERDStatement, parseStatement,
unparseStatement)
+ }
+
+ // Generate C code to validate a primitive element matches one of its
enumeration values.
+ // May be replaced by a more specialized function in another trait for some
types.
+ private def valueValidateEnumeration(
+ e: ElementBase,
+ deref: String,
+ enums: Array[String],
+ cgState: CodeGeneratorState,
+ ): Unit = {
+ val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+ val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+ val localName = cgState.cName(e)
+ val field = s"instance->$localName$deref"
+
+ // Extra initialization only for hexBinary enumerations
+ val arraysName = s"arrays_$localName"
+ val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+ s"{$arraysName[$index], ${s.length / 2}, false}"
+ }
+ val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ",
", "}"))
+ val enumsLenMax = enums.map(_.length / 2).max
Review Comment:
Suggest `hexEnumsLenMax` since this is specific to hex? Or move all the hex
specific stuff to the HexBinary cases so we don't have to calculate this stuff
where it doesn't apply.
##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.h:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef VALIDATORS_H
+#define VALIDATORS_H
+
+// clang-format off
+#include <stdbool.h> // for bool
+#include <stddef.h> // for size_t
+#include <stdint.h> // for int64_t
+#include "infoset.h" // for ParserOrUnparserState
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+extern void validate_array_bounds(const char *name, size_t count, size_t
minOccurs, size_t maxOccurs,
+ ParserOrUnparserState *pustate);
+
+// Validate element's value is same as its fixed attribute
+
+extern void validate_fixed_attribute(bool same, const char *element,
ParserOrUnparserState *pustate);
+
+// Validate element's value matches an enumeration (either floating point or
integer)
Review Comment:
Is this comment talking about all three below functions? It doesn't mention
hexbinary. Should it?
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocument.scala:
##########
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc:
XMLSchemaDocument)
final override lazy val optLexicalParent = Some(xmlSDoc)
final override lazy val optXMLSchemaDocument = Some(xmlSDoc)
+ final lazy val version = (xml \ "@version").text
Review Comment:
I didn't know XSD had a version attribute. Coudld be useful!
Though, I wonder if this should be an Option so we can tell if it's provided
or not? In the code gen case, it can just to `optVersion.getOrElse("")` so it
wouldn't change much.
##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/BinaryValueCodeGenerator.scala:
##########
@@ -93,11 +141,94 @@ trait BinaryValueCodeGenerator {
val initERDStatement = ""
val parseStatement =
- s"""$indent1$indent2 parse_validate_fixed($field == $fixed,
"$localName", pstate);
- |$indent1$indent2 if (pstate->error) return;""".stripMargin
+ s"""$indent1$indent2 validate_fixed_attribute($field == $fixed,
"$localName", &pstate->pu);
+ |$indent1$indent2 if (pstate->pu.error) return;""".stripMargin
+ val unparseStatement =
+ s"""$indent1$indent2 validate_fixed_attribute($field == $fixed,
"$localName", &ustate->pu);
+ |$indent1$indent2 if (ustate->pu.error) return;""".stripMargin
+ cgState.addSimpleTypeStatements(initERDStatement, parseStatement,
unparseStatement)
+ }
+
+ // Generate C code to validate a primitive element matches one of its
enumeration values.
+ // May be replaced by a more specialized function in another trait for some
types.
+ private def valueValidateEnumeration(
+ e: ElementBase,
+ deref: String,
+ enums: Array[String],
+ cgState: CodeGeneratorState,
+ ): Unit = {
+ val indent1 = if (cgState.hasChoice) INDENT else NO_INDENT
+ val indent2 = if (deref.nonEmpty) INDENT else NO_INDENT
+ val localName = cgState.cName(e)
+ val field = s"instance->$localName$deref"
+
+ // Extra initialization only for hexBinary enumerations
+ val arraysName = s"arrays_$localName"
+ val hexEnums = enums.zipWithIndex.map { case (s, index) =>
+ s"{$arraysName[$index], ${s.length / 2}, false}"
+ }
+ val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ",
", "}"))
+ val enumsLenMax = enums.map(_.length / 2).max
+ val arraysInit =
+ s"""$indent1$indent2 uint8_t $arraysName[][$enumsLenMax] =
${hexEnumsInit.mkString(
+ "{",
+ ", ",
+ "}",
+ )};\n"""
Review Comment:
Are there any concerns that there might be a bunch of hex binary enums with
one being much larger than the other, and so this array would have a lot of
empty space? I guess the other option would be to create a bunch of separate
variables (e.g. hexEnum1, hexEnum2, hexEnum3, ... 0), maybe that's not worth
it. Or alternatively, concat the hex arrays into a single long array, and then
then hexEnums have to correctly index into that array? Maybe that's too much
complexity for a case that isn't that likely? I imagine most hex enums are all
the same length, or at least close in length.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]