Re: [PR] Add schema validation code to C generator [daffodil]

via GitHub Sat, 21 Oct 2023 06:57:21 -0700


stevedlawrence commented on code in PR #1098:
URL: https://github.com/apache/daffodil/pull/1098#discussion_r1367730666



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libcli/daffodil_getopt.c:
##########
@@ -85,6 +85,12 @@ parse_daffodil_cli(int argc, char *argv[])
             daffodil_parse.outfile = optarg;
             daffodil_unparse.outfile = optarg;
             break;
+        case 'r':
+            // Ignore "-r root" option/optarg
+            break;
+        case 's':
+            // Ignore "-s schema" option/optarg
+            break;

Review Comment:
   I see, makes sense. As long as the optinos aren't in the usage I think it's 
fine. But it would be worth adding that reasoning as a comment.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocument.scala:
##########
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc: 
XMLSchemaDocument)
   final override lazy val optLexicalParent = Some(xmlSDoc)
   final override lazy val optXMLSchemaDocument = Some(xmlSDoc)
 
+  final lazy val version = (xml \ "@version").text

Review Comment:
   > How do you change the line to return an Option[String]?
   
   `(xml \ "@version")` returns a `NodeSeq`, which has an implicit conversion 
to `Seq[Node]`, so you can use any `Seq` functions on it. So something like 
this should work:
   
   ```scala
   final lazy val optVersion = (xml \ "@version").headOption.map(_.text)
   ```



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
     static const ErrorLookup table[ERR_ZZZ] = {
         {ERR_ARRAY_BOUNDS, "%s count out of bounds\n", FIELD_S},
         {ERR_CHOICE_KEY, "no match between choice dispatch key %" PRId64 " and 
any branch key\n", FIELD_D64},
+        {ERR_ENUM_MATCH, "value of element '%s' does not match any of its 
enumerations\n", FIELD_S},
         {ERR_FIXED_VALUE, "value of element '%s' does not match value of its 
'fixed' attribute\n", FIELD_S},
         {ERR_HEXBINARY_ALLOC, "error allocating hexBinary memory -- %" PRId64 
" bytes\n", FIELD_D64},
         {ERR_LEFTOVER_DATA, "Left over data, at least %i bit(s) remaining 
after end of parse\n", FIELD_C},
+        {ERR_OUTSIDE_RANGE, "value of element '%s' is outside its allowed 
range\n", FIELD_S},

Review Comment:
   I think some of those can actually be used on numeric types, at least I'm 
pretty sure pattern does. Though, I'm not sure off hand, I'd have to dig 
through the XSD spec--I can't find a concise resource right now. But maybe 
codegen-c will support strings and we'll want to add the others.



##########
README.md:
##########
@@ -59,32 +59,42 @@ Compile source code:
 
     sbt compile
 
-### Tests
+### Test
 
-Run unit tests:
+Check all unit tests pass:
 
     sbt test
 
-Run slower integration tests:
+Check all integration tests pass:
 
     sbt daffodil-test-integration/test
 
-### Command Line Interface
+### Format
 
-Build the command line interface (Linux and Windows shell scripts in
-`daffodil-cli/target/universal/stage/bin/`; see the [Command Line
-Interface] documentation for details on their usage):
+Check format of source and sbt files:
+
+    sbt scalafmtCheckAll scalafmtSbtCheck
+
+Reformat source and sbt files if necessary:
+
+    sbt scalafmtAll scalafmtSbt

Review Comment:
   Makes sense. As long as the error is something useful that should be fine. I 
recommend doing it later though, doesn't have to be part of this PR.



##########
BUILD.md:
##########
@@ -50,9 +50,9 @@ following its website's instructions. This is necessary to 
install the
 [libmxml-devel][Mini-XML] package.
 
 You can use the `dnf` package manager to install most of the tools
-needed to build Daffodil:
+used to develop Daffodil:
 
-    sudo dnf install clang gcc git java-11-openjdk-devel llvm make mxml-devel 
pkgconf
+    sudo dnf install clang gcc git iwyu java-11-openjdk-devel llvm make 
mxml-devel pkgconf
 
 If you want to use clang instead of gcc, you'll have to set your

Review Comment:
   Yeah, gcc is GPLv3 which is category X, so I had concerns about implying it 
was required. I did find this open Jira LEGAL issue that discusses this:
   
   
https://issues.apache.org/jira/browse/LEGAL-390?jql=project+%3D+LEGAL+AND+text+%7E+%22gcc%22+ORDER+BY+updated+DESC%2C+priority+DESC
   
   Sounds like the resolution it's it's fine to use. I'd suggest we still do 
mention clang support so that people can use it if they want and don't have to 
figure it all out. But maybe that's an addendum or something, and everything 
else implies the use of gcc if the goal is to simplify this?



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/errors.c:
##########
@@ -80,9 +80,11 @@ error_lookup(uint8_t code)
     static const ErrorLookup table[ERR_ZZZ] = {

Review Comment:
   ERR__NUM_CODES sounds good to me. Also, doesn't have to be one in this PR an 
can be done at a later date. This PR already has a few different things going 
on.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/tests/bits.c:
##########
@@ -20,29 +20,29 @@
 #include <criterion/new/assert.h>  // for CRI_ASSERT_OP_VAR_TAGGED, 
CRI_ASSERT_OP_TYPE_TAGGED, CRI_ASSERT_OP_VAL_TAGGED, CRI_ASSERT_TEST_TAG_u8, 
cr_user_u8_tostr, CRI_ASSERT_OP_MKNODE_TAGGED, CRI_ASSERT_OP_NAME_TAGGED, 
CRI_ASSERT_TEST_TAG_sz, cr_user_sz_tostr, CRI_ASSERT_TYPE_TAG_ID_u8, 
cr_user_u8_eq, CRI_ASSERT_TEST_TAG_ptr, cr_user_ptr_tostr, CRI_ASSERT_MKLIST_2, 
CRI_ASSERT_SPECIFIER_OP1, CRI_ASSERT_SPECIFIER_eq, CRI_ASSERT_SPEC_OPLEN_2, 
CRI_ASSERT_TEST_SPECIFIER_eq, cr_expect, CRI_ASSERT_TYPE_TAG_u8, 
CRI_ASSERT_TYPE_TAG_ID_sz, cr_user_sz_eq, CRI_ASSERT_TYPE_TAG_sz, 
CRI_ASSERT_TYPE_TAG_ID_ptr, cr_user_ptr_eq, CRI_ASSERT_TEST_TAGC_u8, 
CRI_ASSERT_TEST_TAG_int, cr_user_int_tostr, CRI_ASSERT_TYPE_TAG_ptr, 
CRI_ASSERT_TEST_TAGC_sz, CRI_ASSERT_TYPE_TAG_ID_int, cr_user_int_eq, 
CRI_ASSERT_TEST_TAGC_ptr, CRI_ASSERT_TYPE_TAG_int, CRI_ASSERT_TEST_TAG_i16, 
CRI_ASSERT_TEST_TAG_i32, CRI_ASSERT_TEST_TAG_i64, CRI_ASSERT_TEST_TAG_i8, 
CRI_ASSERT_TEST_TAG_u16, CRI_ASSERT_TEST_TAG_u32, CRI_ASSERT_TE..
 .

Review Comment:
   Does this then cause it to wrap, or does it jus cut it off? I assume wrap? 
Seems like that's just going to cause a large number of comment lines of 
clutter. At least when it's on one line it's easier to ignore if you don't care 
about it.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/tests/bits.c:
##########
@@ -125,25 +125,25 @@ Test(bits, be_bool_4)
 Test(bits, be_bool_7_7)
 {
     // Open memory stream for writing to dynamic buffer
-    char *   buffer = NULL;
-    size_t   size = 0;
-    FILE *   stream = open_memstream(&buffer, &size);
-    UState   ustate = {stream, 0, NULL, NULL, 0, 0};
+    char *buffer = NULL;
+    size_t size = 0;
+    FILE *stream = open_memstream(&buffer, &size);
+    UState ustate = {{stream, 0, NULL, NULL}, 0, 0};
     uint32_t true7_rep = 0146; // 0b_1_100_110
     uint32_t false_rep = 0157; // 0b_1_101_111
 
     // Verify that ustate writes 11001101
     unparse_be_bool(true, 7, true7_rep, false_rep, &ustate);
     fflush(stream);
-    cr_expect(eq(ptr, (void *)ustate.error, 0), "ustate should have no error");
-    cr_expect(eq(sz, ustate.bitPos0b, 7), "ustate should advance 7 bits");
+    cr_expect(eq(ptr, (void *)ustate.pu.error, 0), "ustate should have no 
error");

Review Comment:
   Agreed :+1: 



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/tests/bits.c:
##########


Review Comment:
   I guess it doesn't hurt if these files end up in the jar? They can't 
conflict in anyway with the generated output and the runtime lib that is also 
copied out of the jar?



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/DaffodilCCodeGenerator.scala:
##########
@@ -259,6 +260,7 @@ object DaffodilCCodeGenerator
   def generateCode(gram: Gram, cgState: CodeGeneratorState): Unit = {
     gram match {
       case g: AlignmentFill => alignmentFillGenerateCode(g, cgState)
+      case g: AssertBooleanPrim => noop(g)

Review Comment:
   That's fine, not critical for this PR.



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/validators.c:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// clang-format off
+#include "validators.h"
+#include <stdbool.h>  // for bool, false, true
+#include <string.h>   // for memcmp
+#include "errors.h"   // for add_diagnostic, get_diagnostics, Error, 
Diagnostics, ERR_ENUM_MATCH, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, 
ERR_OUTSIDE_RANGE, Error::(anonymous)
+// clang-format on
+
+// Validate element's array count is within its array bounds
+
+void
+validate_array_bounds(const char *name, size_t count, size_t minOccurs, size_t 
maxOccurs,
+                      ParserOrUnparserState *pustate)
+{
+    if (count < minOccurs || count > maxOccurs)
+    {
+        // Array count is not within bounds, so report error
+        static Error error = {ERR_ARRAY_BOUNDS, {0}};
+        error.arg.s = name;
+        pustate->error = &error;
+    }

Review Comment:
   Makes sense, might be worth adding a comment to that affect since it's 
different from the other validators.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add schema validation code to C generator [daffodil]

Reply via email to