[GitHub] [daffodil] tuxji commented on a diff in pull request #901: Add more daffodilC support (arrays, alignment)

via GitHub Fri, 20 Jan 2023 13:38:35 -0800


tuxji commented on code in PR #901:
URL: https://github.com/apache/daffodil/pull/901#discussion_r1083048444



##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryValueCodeGenerator.scala:
##########
@@ -21,81 +21,65 @@ import org.apache.daffodil.dsom.ElementBase
 import org.apache.daffodil.schema.annotation.props.gen.BitOrder
 import org.apache.daffodil.schema.annotation.props.gen.ByteOrder
 import org.apache.daffodil.schema.annotation.props.gen.LengthKind
-import org.apache.daffodil.schema.annotation.props.gen.OccursCountKind
 import org.apache.daffodil.util.Maybe.Nope
 
 // Base trait which provides common code to generate C code for primitive 
value elements
 trait BinaryValueCodeGenerator {
 
-  // Generate C code for a primitive value element by calculating how many 
times the element occurs
-  // and applying the given partially applied function values the necessary 
number of times.  Intended
-  // to be called by other traits which extend this trait, not directly by 
Runtime2CodeGenerator.
-  def binaryValueGenerateCode(e: ElementBase, addField: String => Unit, 
validateFixed: String => Unit): Unit = {
+  // Generate C code for a primitive value element differently depending on 
how many times the element occurs.
+  // Intended to be called by other traits which extend this trait, not 
directly by Runtime2CodeGenerator.
+  def binaryValueGenerateCode(e: ElementBase, addField: String => Unit, 
validateFixed: String => Unit, cgState: CodeGeneratorState): Unit = {
     // For the time being this is a very limited back end.
     // So there are some restrictions to enforce.
     e.schemaDefinitionUnless(e.bitOrder eq BitOrder.MostSignificantBitFirst, 
"Only dfdl:bitOrder 'mostSignificantBitFirst' is supported.")
     e.schemaDefinitionUnless(e.maybeByteOrderEv == Nope || 
e.byteOrderEv.isConstant, "Runtime dfdl:byteOrder expressions not supported.")
     e.schemaDefinitionUnless(e.lengthKind == LengthKind.Prefixed || 
e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions not 
supported.")
 
-    // Calculate how many times the element occurs in the data.
-    val arraySize = e.occursCountKind match {
-      case OccursCountKind.Fixed if e.maxOccurs > 1 => e.maxOccurs
-      case OccursCountKind.Fixed if e.maxOccurs == 1 => 0
-      case OccursCountKind.Implicit if e.minOccurs == 1 && e.maxOccurs == 1 => 0
-      case _ => e.SDE("occursCountKind %s minOccurs %d maxOccurs %d is not 
supported in C code generator",
-        e.occursCountKind.toString, e.minOccurs, e.maxOccurs)
+    // Call the given partially applied function values with their remaining 
unbound argument (deref)
+    val hasArray = cgState.arrayMaxOccurs(e) > 0

Review Comment:
   I've added a hasArray function to CodeGeneratorState to keep track of when 
an array is encountered and simplified the above code to `val deref = if 
(cgState.hasArray) "[i]" else ""`.



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libcli/daffodil_getopt.c:
##########
@@ -85,6 +86,18 @@ parse_daffodil_cli(int argc, char *argv[])
             daffodil_unparse.outfile = optarg;
             break;
         case 'V':
+            if (strcmp("limited", optarg) == 0 || strcmp("on", optarg) == 0)
+            {
+                daffodil_parse.validate = true;

Review Comment:
   Yes, that is expected since this change set does not make any changes to the 
ex_nums example schema.  



##########
daffodil-runtime2/src/test/c/examples/variablelen/generated_code.h:
##########
@@ -0,0 +1,22 @@
+#ifndef GENERATED_CODE_H
+#define GENERATED_CODE_H
+
+// clang-format off
+#include <stdbool.h>  // for bool
+#include <stddef.h>   // for size_t
+#include <stdint.h>   // for uint8_t, int16_t, int32_t, int64_t, uint32_t, 
int8_t, uint16_t, uint64_t
+#include "infoset.h"  // for InfosetBase, HexBinary
+// clang-format on
+
+// Define infoset structures
+
+typedef struct expressionElement_
+{
+    InfosetBase _base;
+    uint32_t    before;
+    uint32_t    variablelen_size;
+    uint32_t    variablelen[16];

Review Comment:
   I've implemented a similar idea in HexBinary, which has a `bool dynamic` 
field recording whether its `array` pointer points to static memory or 
separately allocated memory on the heap.  I didn't do that in this PR since 
we're unlikely to run the code generator on DFDL schemas using large or 
unbounded arrays at this time and we can do another PR if that does occur.



##########
daffodil-runtime2/src/test/resources/org/apache/daffodil/runtime2/variablelen.dfdl.xsd:
##########
@@ -0,0 +1,121 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<xs:schema
+ elementFormDefault="qualified"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
+ xmlns:xs="http://www.w3.org/2001/XMLSchema";>
+
+  <!-- Binary data format bindings -->
+
+  <xs:include 
schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+  <xs:annotation>
+    <xs:appinfo source="http://www.ogf.org/dfdl/";>
+      <dfdl:format
+        ref="GeneralFormat"
+        binaryBooleanFalseRep="0"
+        binaryBooleanTrueRep="1"
+        fillByte="%NUL;"
+        prefixIncludesPrefixLength="no"
+        representation="binary"/>
+    </xs:appinfo>
+  </xs:annotation>
+
+  <!-- Types for testing -->
+
+  <!-- Unsuitable since variable length array must allow 0 to 16 numbers -->
+  <xs:complexType name="fixedType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="2" maxOccurs="2"
+                  dfdl:occursCountKind="fixed" />
+      <!-- Yes, can have "after" element after "fixed" array -->
+      <xs:element name="after" type="xs:unsignedInt"
+                  minOccurs="2" maxOccurs="2"
+                  dfdl:occursCountKind="fixed" />
+    </xs:sequence>
+  </xs:complexType>
+
+  <!-- Unsuitable since "implicit" works only at end of binary data -->
+  <xs:complexType name="implicitType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="0" maxOccurs="16"
+                  dfdl:occursCountKind="implicit" />
+      <!-- No, cannot have "after" element after "implicit" array -->
+    </xs:sequence>
+  </xs:complexType>
+
+  <!-- Unsuitable since "parsed" works only at end of binary data -->
+  <xs:complexType name="parsedType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="0" maxOccurs="16"
+                  dfdl:occursCountKind="parsed" />
+      <!-- No, cannot have "after" element after "parsed" array -->
+    </xs:sequence>
+  </xs:complexType>
+
+  <!-- Suitable, although need to support variable length arrays in C -->
+  <xs:complexType name="expressionType">
+    <xs:sequence>
+      <xs:element name="before" type="xs:unsignedInt" />
+      <xs:element name="variablelen_size" type="xs:unsignedInt" />
+      <xs:element name="variablelen" type="xs:unsignedInt"
+                  minOccurs="0" maxOccurs="16"

Review Comment:
   If we encounter DFDL schemas with large or unbounded arrays, then we'll add 
code to allocate memory from the heap and free it as needed for the information 
set residing in memory.  We can either say you must use unbounded arrays if you 
want dynamic arrays or offer a tunable limit above which sufficiently large 
arrays become dynamic arrays instead of static arrays automatically.



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c:
##########
@@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd)
     return erd->namedQName.ns;
 }
 
-// walkInfosetNode - recursively walk an infoset node and call
+// walkInfoset - walk each node of an infoset and call
+// VisitEventHandler methods
+
+const Error *
+walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode)
+{
+    const Error *error = handler->visitStartDocument(handler);
+
+    if (!error)
+    {
+        error = walkInfosetNode(handler, infoNode);
+    }
+    if (!error)
+    {
+        error = handler->visitEndDocument(handler);
+    }
+
+    return error;
+}
+
+// walkInfosetNode - walk each child of an infoset node and call
 // VisitEventHandler methods
 
 static const Error *
 walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)
 {
-    const size_t      count = infoNode->erd->numChildren;
-    const ERD **const childrenERDs = infoNode->erd->childrenERDs;
-    const size_t *    offsets = infoNode->erd->offsets;
+    const size_t            numChildren = infoNode->erd->numChildren;
+    const ERD *const *const childrenERDs = infoNode->erd->childrenERDs;
+    const size_t *          offsets = infoNode->erd->offsets;
 
     // Start visiting the node
     const Error *error = handler->visitStartComplex(handler, infoNode);
 
-    // Walk the node's children recursively
-    for (size_t i = 0; i < count && !error; i++)
+    // Walk each child of the node
+    for (size_t i = 0; i < numChildren && !error; i++)
     {
-        const size_t offset = offsets[i];
         const ERD *  childERD = childrenERDs[i];

Review Comment:
   If you look at the start of this function, we use this info node's ERD 
pointer at lines 152-154 to look up the children's ERDs and offsets.  The ERDs 
and offsets are immutable data, but the children's ERD pointers are mutable.  
When we walk into choices, we will call initChoice to determine which choice 
branch we dispatch and initChoice will call the given choice's initERD on its 
child node which will change its child's ERD pointer to the given choice 
branch's ERD.  Then we walk into the child node, call this walk function, and 
it dereferences the just-updated ERD pointer at lines 152-154.  I think we 
still need InfoNode to have an ERD pointer and a parent pointer unless we can 
move the mutable ERD pointer from InfoNode to PState/UState for the walk 
functions to update on the fly.  I'll make a note to look into that possibility 
later.



##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala:
##########
@@ -52,13 +53,12 @@ trait BinaryBooleanCodeGenerator extends 
BinaryValueCodeGenerator {
     val unparseTrueRep = if (e.binaryBooleanTrueRep.isDefined) s"$trueRep" 
else s"~$falseRep"
 
     val initERDStatement = ""
-    val initSelfStatement = s"    $field = $initialValue;"
     val parseStatement =
-      s"""    parse_$function(&$field, $lengthInBits, $trueRep, $falseRep, 
pstate);
-         |    if (pstate->error) return;""".stripMargin
+      s"""$indent1$indent2    parse_$function(&$field, $lengthInBits, 
$trueRep, $falseRep, pstate);

Review Comment:
   If I had more time, I would look at other programs which have a C generator 
backend in hope of finding a better way to generate C code (I would prefer 
something more API-like than assembling string templates together). I know of 
one program (the [Zig programming 
language](https://en.wikipedia.org/wiki/Zig_(programming_language)) compiler) 
which has an [additional backend](https://github.com/ziglang/zig/issues/3772) 
to generate C code from Zig code. I believe the Zig compiler compiles all the 
way to intermediate runtime representation (IIR, just barely above machine 
level instructions) and then generates C code from these IIR, so I don't think 
it has a convenient API we can use, however.
   
   Steve suggested using a function to insert the indentation, so later I may 
shorten these string templates by removing "$indent1$indent2   " and ending 
them with .stripMargin(indent) instead of .stripMargin.



##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/AlignedMixin.scala:
##########
@@ -138,7 +138,7 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
   // The separator is text in some encoding, might not be the same as this 
term's encoding, and
   // the alignment will be left where that text leaves it.
   //
-  private lazy val priorAlignmentApprox: AlignmentMultipleOf = 
LV('priorAlignmentApprox) {
+  lazy val priorAlignmentApprox: AlignmentMultipleOf = 
LV('priorAlignmentApprox) {

Review Comment:
   I have reverted these changes.



##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala:
##########
@@ -58,38 +56,209 @@ class CodeGeneratorState(private val root: ElementBase) {
     sb
   }
 
+  // Returns a name for the given element's element resource descriptor
+  private def erdName(context: ElementBase): String = {
+    val sb = buildName(context, new StringBuilder) ++= "ERD"
+    val name = sb.toString
+    name
+  }
+
   // Returns a name for the given element's C struct identifier
   private def cStructName(context: ElementBase): String = {
     val sb = buildName(context, new StringBuilder)
     val name = sb.toString
     name
   }
 
-  // Returns a name for the given element's element resource descriptor
-  private def erdName(context: ElementBase): String = {
-    val sb = buildName(context, new StringBuilder) ++= "ERD"
-    val name = sb.toString
-    name
+  // Returns the notation needed to access a C struct field.  We make some 
simplifying
+  // assumptions to make generating the field access easier:
+  // - the expression contains only a relative or absolute path, nothing else 
(e.g.,
+  //   the expression doesn't call any functions or perform any computation)
+  // - we can convert an absolute path to a rootElement()-> indirection
+  // - we can convert a relative path beginning with up dirs to a parents-> 
indirection
+  // - we can convert a relative path without any up dirs to an instance-> 
indirection
+  // - we can convert slashes in the path to dots in a C struct field access 
notation
+  private def cStructFieldAccess(expr: String): String = {
+    // Strip any namespace prefixes from expr and the root element's local 
name since
+    // C code uses only struct fields' names.
+    val rootName = root.namedQName.local
+    val exprPath = expr.replaceAll("/[^/:]+:", "/").stripPrefix(s"/$rootName")
+    val fieldAccess = if (exprPath.startsWith("/")) {
+      // Convert exprPath to a rootElement()-> indirection
+      val C = cStructName(root)
+      s"""(($C *)rootElement())->${exprPath.stripPrefix("/")}"""
+    } else if (exprPath.startsWith("../")) {
+      // Split exprPath into the up dirs and after the up dirs
+      val afterUpDirs = exprPath.split("\\.\\./").mkString
+      val upDirs = exprPath.stripSuffix(afterUpDirs)
+      // Count how many up dirs there are
+      val nUpDirs = upDirs.split('/').length
+      // Go up the stack that many times to get that struct's C type
+      val C = structs(nUpDirs).C
+      // Convert the up dirs to parents
+      val parents = upDirs.replaceAllLiterally("../", 
"parent->").stripSuffix("->")
+      // Convert exprPath to a parents-> indirection
+      s"""(($C *)instance->_base.$parents)->$afterUpDirs"""
+    } else {
+      // Convert exprPath to an instance-> indirection
+      s"""instance->$exprPath"""
+    }
+    // Finally, convert the field access to C struct dot notation
+    val notation = fieldAccess.replace('/', '.')
+    notation
   }
 
-  // Returns true if the element has not been seen before (checking if a
-  // map already contains the element, otherwise adding it to the map)
-  def elementNotSeenYet(context: ElementBase): Boolean = {
-    val key = cStructName(context)
-    val alreadySeen = elementsAlreadySeen.contains(key)
-    if (!alreadySeen)
-      elementsAlreadySeen += (key -> context)
-    !alreadySeen
+  // Returns the code needed to get the size of an array of elements, which
+  // may be either a constant (maxOccurs) or an expression (occursCount)
+  // which accesses a particular C struct field.
+  private def getOccursCount(e: ElementBase): String = {
+    val occursCount = e.occursCountKind match {
+      case OccursCountKind.Fixed =>
+        s"""    UNUSED(instance);
+           |    return ${e.maxOccurs};""".stripMargin
+      case OccursCountKind.Expression =>
+        // Extract expression from {...} in element's occursCount attribute
+        val xml = e.occursCountEv.toBriefXML()

Review Comment:
   Good idea, I've made the change.



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c:
##########
@@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd)
     return erd->namedQName.ns;
 }
 
-// walkInfosetNode - recursively walk an infoset node and call
+// walkInfoset - walk each node of an infoset and call
+// VisitEventHandler methods
+
+const Error *
+walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode)
+{
+    const Error *error = handler->visitStartDocument(handler);
+
+    if (!error)
+    {
+        error = walkInfosetNode(handler, infoNode);
+    }
+    if (!error)
+    {
+        error = handler->visitEndDocument(handler);
+    }
+
+    return error;
+}
+
+// walkInfosetNode - walk each child of an infoset node and call
 // VisitEventHandler methods
 
 static const Error *
 walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)
 {
-    const size_t      count = infoNode->erd->numChildren;
-    const ERD **const childrenERDs = infoNode->erd->childrenERDs;
-    const size_t *    offsets = infoNode->erd->offsets;
+    const size_t            numChildren = infoNode->erd->numChildren;
+    const ERD *const *const childrenERDs = infoNode->erd->childrenERDs;
+    const size_t *          offsets = infoNode->erd->offsets;

Review Comment:
   Yes, you are correct that the offsets should be named childrenOffsets for 
consistency.  This PR is already very large so I will make that change in 
another PR.



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.h:
##########
@@ -105,7 +111,8 @@ typedef struct HexBinary
 
 typedef struct InfosetBase
 {
-    const ERD *erd;
+    const ERD *               erd;

Review Comment:
   Yes, the code still needs the erd pointer to determine which choice branch 
an element belongs to.  I said above that maybe PState/UState could hold a 
mutable erd pointer instead of InfoNode, but I wonder if that would make it 
hard for a debugger to point at the start of any InfoNode and display it in a 
type-aware manner.  You'd have to do an orderly walk starting at the top for 
PState/Ustate to have the correct erd pointer, and a debugger might not have a 
PState/Ustate?



##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/parsers.c:
##########
@@ -507,14 +507,21 @@ parse_le_uint8(uint8_t *number, size_t num_bits, PState 
*pstate)
     *number = (uint8_t)integer;
 }
 
-// Parse fill bits until end bitPos0b is reached
+// Parse fill bits up to alignmentInBits or end_bitPos0b
+
+void
+parse_align(size_t alignmentInBits, PState *pstate)
+{
+    size_t end_bitPos0b = (pstate->bitPos0b + alignmentInBits - 1) / 
alignmentInBits * alignmentInBits;

Review Comment:
   I got the formula from 
https://stackoverflow.com/questions/29925524/how-do-i-round-to-the-next-32-bit-alignment
 where it was given as (n + k - 1) / k * k.  I take it that ((n + k - 1) / k) * 
k would be clearer.  The comments said that formula would be better than 
another formula using the modular division operator (%).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] tuxji commented on a diff in pull request #901: Add more daffodilC support (arrays, alignment)

Reply via email to