[GitHub] [daffodil] mbeckerle commented on a diff in pull request #901: Add more daffodilC support (arrays, alignment)

GitBox Wed, 18 Jan 2023 14:56:22 -0800


mbeckerle commented on code in PR #901:
URL: https://github.com/apache/daffodil/pull/901#discussion_r1080548480



##########
daffodil-runtime2/src/test/resources/org/apache/daffodil/runtime2/variablelen.dfdl.xsd:
##########
@@ -0,0 +1,121 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<xs:schema
+ elementFormDefault="qualified"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
+ xmlns:xs="http://www.w3.org/2001/XMLSchema";>
+
+  <!-- Binary data format bindings -->
+
+  <xs:include 
schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+  <xs:annotation>
+    <xs:appinfo source="http://www.ogf.org/dfdl/";>
+      <dfdl:format
+        ref="GeneralFormat"
+        binaryBooleanFalseRep="0"
+        binaryBooleanTrueRep="1"
+        fillByte="%NUL;"
+        prefixIncludesPrefixLength="no"
+        representation="binary"/>
+    </xs:appinfo>
+  </xs:annotation>
+
+  <!-- Types for testing -->
+
+  <!-- Unsuitable since variable length array must allow 0 to 16 numbers -->
+  <xs:complexType name="fixedType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="2" maxOccurs="2"
+                  dfdl:occursCountKind="fixed" />
+      <!-- Yes, can have "after" element after "fixed" array -->
+      <xs:element name="after" type="xs:unsignedInt"
+                  minOccurs="2" maxOccurs="2"
+                  dfdl:occursCountKind="fixed" />
+    </xs:sequence>
+  </xs:complexType>
+
+  <!-- Unsuitable since "implicit" works only at end of binary data -->
+  <xs:complexType name="implicitType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="0" maxOccurs="16"
+                  dfdl:occursCountKind="implicit" />
+      <!-- No, cannot have "after" element after "implicit" array -->
+    </xs:sequence>
+  </xs:complexType>
+
+  <!-- Unsuitable since "parsed" works only at end of binary data -->
+  <xs:complexType name="parsedType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="0" maxOccurs="16"
+                  dfdl:occursCountKind="parsed" />
+      <!-- No, cannot have "after" element after "parsed" array -->
+    </xs:sequence>
+  </xs:complexType>
+
+  <!-- Suitable, although need to support variable length arrays in C -->
+  <xs:complexType name="expressionType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="0" maxOccurs="16"

Review Comment:
   So the representation you are supporting in the data stream requires the 
array to have a maxOccurs, and to not exceed it because the Infoset is going to 
pre-allocate space for the maximum?
   
   This makes sense, for now but you need some tunable limit above-which 
warnings are issued to users. Otherwise users will balk at unbounded, and just 
change it to 1000000000. Or perhaps it's just a hard limit on the max size?
   
   We do need to support actually dynamically sized arrays eventually, but 
we'll need to discuss how to do an efficient infoset for them. 



##########
daffodil-runtime2/src/test/c/examples/variablelen/generated_code.h:
##########
@@ -0,0 +1,22 @@
+#ifndef GENERATED_CODE_H
+#define GENERATED_CODE_H
+
+// clang-format off
+#include <stdbool.h>  // for bool
+#include <stddef.h>   // for size_t
+#include <stdint.h>   // for uint8_t, int16_t, int32_t, int64_t, uint32_t, 
int8_t, uint16_t, uint64_t
+#include "infoset.h"  // for InfosetBase, HexBinary
+// clang-format on
+
+// Define infoset structures
+
+typedef struct expressionElement_
+{
+    InfosetBase _base;
+    uint32_t    before;
+    uint32_t    variablelen_size;
+    uint32_t    variablelen[16];

Review Comment:
   We're going to eventually need an infosetKind tag field in here to specify 
which of several representations are being used for this array. I think one 
should be able to have either directly inline (as you have it here), or a 
separately allocated structure (for larger arrays) where instead of elements 
here, this would reference the other object for the array slots. 



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c:
##########
@@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd)
     return erd->namedQName.ns;
 }
 
-// walkInfosetNode - recursively walk an infoset node and call
+// walkInfoset - walk each node of an infoset and call
+// VisitEventHandler methods
+
+const Error *
+walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode)
+{
+    const Error *error = handler->visitStartDocument(handler);
+
+    if (!error)
+    {
+        error = walkInfosetNode(handler, infoNode);
+    }
+    if (!error)
+    {
+        error = handler->visitEndDocument(handler);
+    }
+
+    return error;
+}
+
+// walkInfosetNode - walk each child of an infoset node and call
 // VisitEventHandler methods
 
 static const Error *
 walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)
 {
-    const size_t      count = infoNode->erd->numChildren;
-    const ERD **const childrenERDs = infoNode->erd->childrenERDs;
-    const size_t *    offsets = infoNode->erd->offsets;
+    const size_t            numChildren = infoNode->erd->numChildren;
+    const ERD *const *const childrenERDs = infoNode->erd->childrenERDs;
+    const size_t *          offsets = infoNode->erd->offsets;

Review Comment:
   offsets to what?
   
   For naming consistency should this be childrenOffsets? 
   
   Or are these offsets to non-element sub-structures like choices also? If 
that's the case I'd suggest termOffsets as the name. 
   



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/parsers.c:
##########
@@ -507,14 +507,21 @@ parse_le_uint8(uint8_t *number, size_t num_bits, PState 
*pstate)
     *number = (uint8_t)integer;
 }
 
-// Parse fill bits until end bitPos0b is reached
+// Parse fill bits up to alignmentInBits or end_bitPos0b
+
+void
+parse_align(size_t alignmentInBits, PState *pstate)
+{
+    size_t end_bitPos0b = (pstate->bitPos0b + alignmentInBits - 1) / 
alignmentInBits * alignmentInBits;

Review Comment:
   Please add the implied parentheses here. At first I though you were dividing 
by alignmentInBits squared. 
   
   The fact that this works to compute what the ending pos0b should be is not 
actually obvious.  I expected to see modular division operator ( x % y to get 
the remainder) here to figure out the amount of fill. 



##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala:
##########
@@ -52,13 +53,12 @@ trait BinaryBooleanCodeGenerator extends 
BinaryValueCodeGenerator {
     val unparseTrueRep = if (e.binaryBooleanTrueRep.isDefined) s"$trueRep" 
else s"~$falseRep"
 
     val initERDStatement = ""
-    val initSelfStatement = s"    $field = $initialValue;"
     val parseStatement =
-      s"""    parse_$function(&$field, $lengthInBits, $trueRep, $falseRep, 
pstate);
-         |    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    parse_$function(&$field, $lengthInBits, 
$trueRep, $falseRep, pstate);

Review Comment:
   These templates are getting pretty hard to interpret. really you have to 
just go look at examples of generated output to see if they make sense. 
   
   That said, I'm not sure how we do better. 
   
   Alternative technique is defining little scala abstract syntax tree nodes  
that represent these things, and we construct a structure of those which then 
generate the nitty gritty code. 
   That extra indirection to what is exactly being generated gets in the way of 
understanding just as much as the cluttered business of these templates now. 



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/unparsers.c:
##########
@@ -401,7 +401,14 @@ unparse_le_uint8(uint8_t number, size_t num_bits, UState 
*ustate)
     unparse_endian_uint64(LITTLE_ENDIAN_DATA, number, num_bits, ustate);
 }
 
-// Unparse fill bits until end bitPos0b is reached
+// Unparse fill bits up to alignmentInBits or end_bitPos0b
+
+void
+unparse_align(size_t alignmentInBits, const uint8_t fill_byte, UState *ustate)
+{
+    size_t end_bitPos0b = (ustate->bitPos0b + alignmentInBits - 1) / 
alignmentInBits * alignmentInBits;

Review Comment:
   Ditto my comments on the parse align. 



##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala:
##########
@@ -58,38 +56,209 @@ class CodeGeneratorState(private val root: ElementBase) {
     sb
   }
 
+  // Returns a name for the given element's element resource descriptor
+  private def erdName(context: ElementBase): String = {
+    val sb = buildName(context, new StringBuilder) ++= "ERD"
+    val name = sb.toString
+    name
+  }
+
   // Returns a name for the given element's C struct identifier
   private def cStructName(context: ElementBase): String = {
     val sb = buildName(context, new StringBuilder)
     val name = sb.toString
     name
   }
 
-  // Returns a name for the given element's element resource descriptor
-  private def erdName(context: ElementBase): String = {
-    val sb = buildName(context, new StringBuilder) ++= "ERD"
-    val name = sb.toString
-    name
+  // Returns the notation needed to access a C struct field.  We make some 
simplifying
+  // assumptions to make generating the field access easier:
+  // - the expression contains only a relative or absolute path, nothing else 
(e.g.,
+  //   the expression doesn't call any functions or perform any computation)
+  // - we can convert an absolute path to a rootElement()-> indirection
+  // - we can convert a relative path beginning with up dirs to a parents-> 
indirection
+  // - we can convert a relative path without any up dirs to an instance-> 
indirection
+  // - we can convert slashes in the path to dots in a C struct field access 
notation
+  private def cStructFieldAccess(expr: String): String = {
+    // Strip any namespace prefixes from expr and the root element's local 
name since
+    // C code uses only struct fields' names.
+    val rootName = root.namedQName.local
+    val exprPath = expr.replaceAll("/[^/:]+:", "/").stripPrefix(s"/$rootName")
+    val fieldAccess = if (exprPath.startsWith("/")) {
+      // Convert exprPath to a rootElement()-> indirection
+      val C = cStructName(root)
+      s"""(($C *)rootElement())->${exprPath.stripPrefix("/")}"""
+    } else if (exprPath.startsWith("../")) {
+      // Split exprPath into the up dirs and after the up dirs
+      val afterUpDirs = exprPath.split("\\.\\./").mkString
+      val upDirs = exprPath.stripSuffix(afterUpDirs)
+      // Count how many up dirs there are
+      val nUpDirs = upDirs.split('/').length
+      // Go up the stack that many times to get that struct's C type
+      val C = structs(nUpDirs).C
+      // Convert the up dirs to parents
+      val parents = upDirs.replaceAllLiterally("../", 
"parent->").stripSuffix("->")
+      // Convert exprPath to a parents-> indirection
+      s"""(($C *)instance->_base.$parents)->$afterUpDirs"""
+    } else {
+      // Convert exprPath to an instance-> indirection
+      s"""instance->$exprPath"""
+    }
+    // Finally, convert the field access to C struct dot notation
+    val notation = fieldAccess.replace('/', '.')
+    notation
   }

Review Comment:
   A walker is a *very* good idea.
   
   I had been concerned about how to refactor the expression code in 
daffodil-core and runtime1 because I've been thinking of it as us needing to 
return a data structure that an alternate runtime can traverse itself. 
   
   But if we write a walker, then alternate runtimes can use that to walk the 
structure, and the current objects with all their mixin-intensive organization 
do not require being refactored. 
   
   Let's plan to write a walker for expresions and replace just this part of 
runtime2 in a subsequent PR, even just reproducing the current limited 
functionality. But it would prove out the walker approach perhaps.
   



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c:
##########
@@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd)
     return erd->namedQName.ns;
 }
 
-// walkInfosetNode - recursively walk an infoset node and call
+// walkInfoset - walk each node of an infoset and call
+// VisitEventHandler methods
+
+const Error *
+walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode)
+{
+    const Error *error = handler->visitStartDocument(handler);
+
+    if (!error)
+    {
+        error = walkInfosetNode(handler, infoNode);
+    }
+    if (!error)
+    {
+        error = handler->visitEndDocument(handler);
+    }
+
+    return error;
+}
+
+// walkInfosetNode - walk each child of an infoset node and call
 // VisitEventHandler methods
 
 static const Error *
 walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)
 {
-    const size_t      count = infoNode->erd->numChildren;
-    const ERD **const childrenERDs = infoNode->erd->childrenERDs;
-    const size_t *    offsets = infoNode->erd->offsets;
+    const size_t            numChildren = infoNode->erd->numChildren;
+    const ERD *const *const childrenERDs = infoNode->erd->childrenERDs;
+    const size_t *          offsets = infoNode->erd->offsets;
 
     // Start visiting the node
     const Error *error = handler->visitStartComplex(handler, infoNode);
 
-    // Walk the node's children recursively
-    for (size_t i = 0; i < count && !error; i++)
+    // Walk each child of the node
+    for (size_t i = 0; i < numChildren && !error; i++)
     {
-        const size_t offset = offsets[i];
         const ERD *  childERD = childrenERDs[i];

Review Comment:
   So here you are accessing the child element InfoNode's ERD not by looking at 
the node, but by looking in your own static ERD metadata. This is cool. But the 
child pointer below is to an InfoNode and the base InfoBase still  begins with 
a ref to that child element's ERD.
   
   Is there a plan to have the children be "ERD-less" as in their ERD is static 
info of their parent? This is potentially a large space savings as each child 
object can then be just a pure value really if they are fixed length, and as 
simple as a length and the data when they are variable length. 



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.h:
##########
@@ -105,7 +111,8 @@ typedef struct HexBinary
 
 typedef struct InfosetBase
 {
-    const ERD *erd;
+    const ERD *               erd;

Review Comment:
   Do you still use this erd pointer? Is it here for debugging purposes only? 
   
   I see you are storing a parent pointer here, and given that, one can compute 
the ERD as follows:
   
   Go to the parent, scan through its child offsets to find the offset to this 
object.
   The index of that offset in the vector of offsets is the child index.
   You can then index into the parent's childERDs to the ERD.
   
   So really, all you need is the parent pointer at the front of the InfoNode, 
and you can do a generic walk starting from any InfoNode.  That's what you need 
for a debugger., point at the start of anything and you can display it in a 
type-aware manner. 
   
   If walking in orderly fashion starting at the top, you wouldn't need to use 
this technique, as the more direct walk down works fine. 
   
   For the root of the infoset I'd suggest create a static dummy parent that 
exists just so that the logic for finding one's ERD works the same for every 
InfoNode including the root. 
   
   One word of overhead per child object is quite excellent. 



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libcli/daffodil_getopt.c:
##########
@@ -85,6 +86,18 @@ parse_daffodil_cli(int argc, char *argv[])
             daffodil_unparse.outfile = optarg;
             break;
         case 'V':
+            if (strcmp("limited", optarg) == 0 || strcmp("on", optarg) == 0)
+            {
+                daffodil_parse.validate = true;

Review Comment:
   I did not find any example schemas in this change set using 
fixed="someValue" properties. Is that expected?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] mbeckerle commented on a diff in pull request #901: Add more daffodilC support (arrays, alignment)

Reply via email to