tuxji commented on code in PR #901:
URL: https://github.com/apache/daffodil/pull/901#discussion_r1083048444
##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryValueCodeGenerator.scala:
##########
@@ -21,81 +21,65 @@ import org.apache.daffodil.dsom.ElementBase
import org.apache.daffodil.schema.annotation.props.gen.BitOrder
import org.apache.daffodil.schema.annotation.props.gen.ByteOrder
import org.apache.daffodil.schema.annotation.props.gen.LengthKind
-import org.apache.daffodil.schema.annotation.props.gen.OccursCountKind
import org.apache.daffodil.util.Maybe.Nope
// Base trait which provides common code to generate C code for primitive
value elements
trait BinaryValueCodeGenerator {
- // Generate C code for a primitive value element by calculating how many
times the element occurs
- // and applying the given partially applied function values the necessary
number of times. Intended
- // to be called by other traits which extend this trait, not directly by
Runtime2CodeGenerator.
- def binaryValueGenerateCode(e: ElementBase, addField: String => Unit,
validateFixed: String => Unit): Unit = {
+ // Generate C code for a primitive value element differently depending on
how many times the element occurs.
+ // Intended to be called by other traits which extend this trait, not
directly by Runtime2CodeGenerator.
+ def binaryValueGenerateCode(e: ElementBase, addField: String => Unit,
validateFixed: String => Unit, cgState: CodeGeneratorState): Unit = {
// For the time being this is a very limited back end.
// So there are some restrictions to enforce.
e.schemaDefinitionUnless(e.bitOrder eq BitOrder.MostSignificantBitFirst,
"Only dfdl:bitOrder 'mostSignificantBitFirst' is supported.")
e.schemaDefinitionUnless(e.maybeByteOrderEv == Nope ||
e.byteOrderEv.isConstant, "Runtime dfdl:byteOrder expressions not supported.")
e.schemaDefinitionUnless(e.lengthKind == LengthKind.Prefixed ||
e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions not
supported.")
- // Calculate how many times the element occurs in the data.
- val arraySize = e.occursCountKind match {
- case OccursCountKind.Fixed if e.maxOccurs > 1 => e.maxOccurs
- case OccursCountKind.Fixed if e.maxOccurs == 1 => 0
- case OccursCountKind.Implicit if e.minOccurs == 1 && e.maxOccurs == 1 => 0
- case _ => e.SDE("occursCountKind %s minOccurs %d maxOccurs %d is not
supported in C code generator",
- e.occursCountKind.toString, e.minOccurs, e.maxOccurs)
+ // Call the given partially applied function values with their remaining
unbound argument (deref)
+ val hasArray = cgState.arrayMaxOccurs(e) > 0
Review Comment:
I've added a hasArray function to CodeGeneratorState to keep track of when
an array is encountered and simplified the above code to `val deref = if
(cgState.hasArray) "[i]" else ""`.
##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libcli/daffodil_getopt.c:
##########
@@ -85,6 +86,18 @@ parse_daffodil_cli(int argc, char *argv[])
daffodil_unparse.outfile = optarg;
break;
case 'V':
+ if (strcmp("limited", optarg) == 0 || strcmp("on", optarg) == 0)
+ {
+ daffodil_parse.validate = true;
Review Comment:
Yes, that is expected since this change set does not make any changes to the
ex_nums example schema.
##########
daffodil-runtime2/src/test/c/examples/variablelen/generated_code.h:
##########
@@ -0,0 +1,22 @@
+#ifndef GENERATED_CODE_H
+#define GENERATED_CODE_H
+
+// clang-format off
+#include <stdbool.h> // for bool
+#include <stddef.h> // for size_t
+#include <stdint.h> // for uint8_t, int16_t, int32_t, int64_t, uint32_t,
int8_t, uint16_t, uint64_t
+#include "infoset.h" // for InfosetBase, HexBinary
+// clang-format on
+
+// Define infoset structures
+
+typedef struct expressionElement_
+{
+ InfosetBase _base;
+ uint32_t before;
+ uint32_t variablelen_size;
+ uint32_t variablelen[16];
Review Comment:
I've implemented a similar idea in HexBinary, which has a `bool dynamic`
field recording whether its `array` pointer points to static memory or
separately allocated memory on the heap. I didn't do that in this PR since
we're unlikely to run the code generator on DFDL schemas using large or
unbounded arrays at this time and we can do another PR if that does occur.
##########
daffodil-runtime2/src/test/resources/org/apache/daffodil/runtime2/variablelen.dfdl.xsd:
##########
@@ -0,0 +1,121 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<xs:schema
+ elementFormDefault="qualified"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+ xmlns:xs="http://www.w3.org/2001/XMLSchema">
+
+ <!-- Binary data format bindings -->
+
+ <xs:include
schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+ <xs:annotation>
+ <xs:appinfo source="http://www.ogf.org/dfdl/">
+ <dfdl:format
+ ref="GeneralFormat"
+ binaryBooleanFalseRep="0"
+ binaryBooleanTrueRep="1"
+ fillByte="%NUL;"
+ prefixIncludesPrefixLength="no"
+ representation="binary"/>
+ </xs:appinfo>
+ </xs:annotation>
+
+ <!-- Types for testing -->
+
+ <!-- Unsuitable since variable length array must allow 0 to 16 numbers -->
+ <xs:complexType name="fixedType">
+ <xs:sequence>
+ <xs:element name="before" type="xs:unsignedInt" />
+ <xs:element name="variablelen_size" type="xs:unsignedInt" />
+ <xs:element name="variablelen" type="xs:unsignedInt"
+ minOccurs="2" maxOccurs="2"
+ dfdl:occursCountKind="fixed" />
+ <!-- Yes, can have "after" element after "fixed" array -->
+ <xs:element name="after" type="xs:unsignedInt"
+ minOccurs="2" maxOccurs="2"
+ dfdl:occursCountKind="fixed" />
+ </xs:sequence>
+ </xs:complexType>
+
+ <!-- Unsuitable since "implicit" works only at end of binary data -->
+ <xs:complexType name="implicitType">
+ <xs:sequence>
+ <xs:element name="before" type="xs:unsignedInt" />
+ <xs:element name="variablelen_size" type="xs:unsignedInt" />
+ <xs:element name="variablelen" type="xs:unsignedInt"
+ minOccurs="0" maxOccurs="16"
+ dfdl:occursCountKind="implicit" />
+ <!-- No, cannot have "after" element after "implicit" array -->
+ </xs:sequence>
+ </xs:complexType>
+
+ <!-- Unsuitable since "parsed" works only at end of binary data -->
+ <xs:complexType name="parsedType">
+ <xs:sequence>
+ <xs:element name="before" type="xs:unsignedInt" />
+ <xs:element name="variablelen_size" type="xs:unsignedInt" />
+ <xs:element name="variablelen" type="xs:unsignedInt"
+ minOccurs="0" maxOccurs="16"
+ dfdl:occursCountKind="parsed" />
+ <!-- No, cannot have "after" element after "parsed" array -->
+ </xs:sequence>
+ </xs:complexType>
+
+ <!-- Suitable, although need to support variable length arrays in C -->
+ <xs:complexType name="expressionType">
+ <xs:sequence>
+ <xs:element name="before" type="xs:unsignedInt" />
+ <xs:element name="variablelen_size" type="xs:unsignedInt" />
+ <xs:element name="variablelen" type="xs:unsignedInt"
+ minOccurs="0" maxOccurs="16"
Review Comment:
If we encounter DFDL schemas with large or unbounded arrays, then we'll add
code to allocate memory from the heap and free it as needed for the information
set residing in memory. We can either say you must use unbounded arrays if you
want dynamic arrays or offer a tunable limit above which sufficiently large
arrays become dynamic arrays instead of static arrays automatically.
##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c:
##########
@@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd)
return erd->namedQName.ns;
}
-// walkInfosetNode - recursively walk an infoset node and call
+// walkInfoset - walk each node of an infoset and call
+// VisitEventHandler methods
+
+const Error *
+walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode)
+{
+ const Error *error = handler->visitStartDocument(handler);
+
+ if (!error)
+ {
+ error = walkInfosetNode(handler, infoNode);
+ }
+ if (!error)
+ {
+ error = handler->visitEndDocument(handler);
+ }
+
+ return error;
+}
+
+// walkInfosetNode - walk each child of an infoset node and call
// VisitEventHandler methods
static const Error *
walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)
{
- const size_t count = infoNode->erd->numChildren;
- const ERD **const childrenERDs = infoNode->erd->childrenERDs;
- const size_t * offsets = infoNode->erd->offsets;
+ const size_t numChildren = infoNode->erd->numChildren;
+ const ERD *const *const childrenERDs = infoNode->erd->childrenERDs;
+ const size_t * offsets = infoNode->erd->offsets;
// Start visiting the node
const Error *error = handler->visitStartComplex(handler, infoNode);
- // Walk the node's children recursively
- for (size_t i = 0; i < count && !error; i++)
+ // Walk each child of the node
+ for (size_t i = 0; i < numChildren && !error; i++)
{
- const size_t offset = offsets[i];
const ERD * childERD = childrenERDs[i];
Review Comment:
If you look at the start of this function, we use this info node's ERD
pointer at lines 152-154 to look up the children's ERDs and offsets. The ERDs
and offsets are immutable data, but the children's ERD pointers are mutable.
When we walk into choices, we will call initChoice to determine which choice
branch we dispatch and initChoice will call the given choice's initERD on its
child node which will change its child's ERD pointer to the given choice
branch's ERD. Then we walk into the child node, call this walk function, and
it dereferences the just-updated ERD pointer at lines 152-154. I think we
still need InfoNode to have an ERD pointer and a parent pointer unless we can
move the mutable ERD pointer from InfoNode to PState/UState for the walk
functions to update on the fly. I'll make a note to look into that possibility
later.
##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala:
##########
@@ -52,13 +53,12 @@ trait BinaryBooleanCodeGenerator extends
BinaryValueCodeGenerator {
val unparseTrueRep = if (e.binaryBooleanTrueRep.isDefined) s"$trueRep"
else s"~$falseRep"
val initERDStatement = ""
- val initSelfStatement = s" $field = $initialValue;"
val parseStatement =
- s""" parse_$function(&$field, $lengthInBits, $trueRep, $falseRep,
pstate);
- | if (pstate->error) return;""".stripMargin
+ s"""$indent1$indent2 parse_$function(&$field, $lengthInBits,
$trueRep, $falseRep, pstate);
Review Comment:
If I had more time, I would look at other programs which have a C generator
backend in hope of finding a better way to generate C code (I would prefer
something more API-like than assembling string templates together). I know of
one program (the [Zig programming
language](https://en.wikipedia.org/wiki/Zig_(programming_language)) compiler)
which has an [additional backend](https://github.com/ziglang/zig/issues/3772)
to generate C code from Zig code. I believe the Zig compiler compiles all the
way to intermediate runtime representation (IIR, just barely above machine
level instructions) and then generates C code from these IIR, so I don't think
it has a convenient API we can use, however.
Steve suggested using a function to insert the indentation, so later I may
shorten these string templates by removing "$indent1$indent2 " and ending
them with .stripMargin(indent) instead of .stripMargin.
##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/AlignedMixin.scala:
##########
@@ -138,7 +138,7 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
// The separator is text in some encoding, might not be the same as this
term's encoding, and
// the alignment will be left where that text leaves it.
//
- private lazy val priorAlignmentApprox: AlignmentMultipleOf =
LV('priorAlignmentApprox) {
+ lazy val priorAlignmentApprox: AlignmentMultipleOf =
LV('priorAlignmentApprox) {
Review Comment:
I have reverted these changes.
##########
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala:
##########
@@ -58,38 +56,209 @@ class CodeGeneratorState(private val root: ElementBase) {
sb
}
+ // Returns a name for the given element's element resource descriptor
+ private def erdName(context: ElementBase): String = {
+ val sb = buildName(context, new StringBuilder) ++= "ERD"
+ val name = sb.toString
+ name
+ }
+
// Returns a name for the given element's C struct identifier
private def cStructName(context: ElementBase): String = {
val sb = buildName(context, new StringBuilder)
val name = sb.toString
name
}
- // Returns a name for the given element's element resource descriptor
- private def erdName(context: ElementBase): String = {
- val sb = buildName(context, new StringBuilder) ++= "ERD"
- val name = sb.toString
- name
+ // Returns the notation needed to access a C struct field. We make some
simplifying
+ // assumptions to make generating the field access easier:
+ // - the expression contains only a relative or absolute path, nothing else
(e.g.,
+ // the expression doesn't call any functions or perform any computation)
+ // - we can convert an absolute path to a rootElement()-> indirection
+ // - we can convert a relative path beginning with up dirs to a parents->
indirection
+ // - we can convert a relative path without any up dirs to an instance->
indirection
+ // - we can convert slashes in the path to dots in a C struct field access
notation
+ private def cStructFieldAccess(expr: String): String = {
+ // Strip any namespace prefixes from expr and the root element's local
name since
+ // C code uses only struct fields' names.
+ val rootName = root.namedQName.local
+ val exprPath = expr.replaceAll("/[^/:]+:", "/").stripPrefix(s"/$rootName")
+ val fieldAccess = if (exprPath.startsWith("/")) {
+ // Convert exprPath to a rootElement()-> indirection
+ val C = cStructName(root)
+ s"""(($C *)rootElement())->${exprPath.stripPrefix("/")}"""
+ } else if (exprPath.startsWith("../")) {
+ // Split exprPath into the up dirs and after the up dirs
+ val afterUpDirs = exprPath.split("\\.\\./").mkString
+ val upDirs = exprPath.stripSuffix(afterUpDirs)
+ // Count how many up dirs there are
+ val nUpDirs = upDirs.split('/').length
+ // Go up the stack that many times to get that struct's C type
+ val C = structs(nUpDirs).C
+ // Convert the up dirs to parents
+ val parents = upDirs.replaceAllLiterally("../",
"parent->").stripSuffix("->")
+ // Convert exprPath to a parents-> indirection
+ s"""(($C *)instance->_base.$parents)->$afterUpDirs"""
+ } else {
+ // Convert exprPath to an instance-> indirection
+ s"""instance->$exprPath"""
+ }
+ // Finally, convert the field access to C struct dot notation
+ val notation = fieldAccess.replace('/', '.')
+ notation
}
- // Returns true if the element has not been seen before (checking if a
- // map already contains the element, otherwise adding it to the map)
- def elementNotSeenYet(context: ElementBase): Boolean = {
- val key = cStructName(context)
- val alreadySeen = elementsAlreadySeen.contains(key)
- if (!alreadySeen)
- elementsAlreadySeen += (key -> context)
- !alreadySeen
+ // Returns the code needed to get the size of an array of elements, which
+ // may be either a constant (maxOccurs) or an expression (occursCount)
+ // which accesses a particular C struct field.
+ private def getOccursCount(e: ElementBase): String = {
+ val occursCount = e.occursCountKind match {
+ case OccursCountKind.Fixed =>
+ s""" UNUSED(instance);
+ | return ${e.maxOccurs};""".stripMargin
+ case OccursCountKind.Expression =>
+ // Extract expression from {...} in element's occursCount attribute
+ val xml = e.occursCountEv.toBriefXML()
Review Comment:
Good idea, I've made the change.
##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c:
##########
@@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd)
return erd->namedQName.ns;
}
-// walkInfosetNode - recursively walk an infoset node and call
+// walkInfoset - walk each node of an infoset and call
+// VisitEventHandler methods
+
+const Error *
+walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode)
+{
+ const Error *error = handler->visitStartDocument(handler);
+
+ if (!error)
+ {
+ error = walkInfosetNode(handler, infoNode);
+ }
+ if (!error)
+ {
+ error = handler->visitEndDocument(handler);
+ }
+
+ return error;
+}
+
+// walkInfosetNode - walk each child of an infoset node and call
// VisitEventHandler methods
static const Error *
walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)
{
- const size_t count = infoNode->erd->numChildren;
- const ERD **const childrenERDs = infoNode->erd->childrenERDs;
- const size_t * offsets = infoNode->erd->offsets;
+ const size_t numChildren = infoNode->erd->numChildren;
+ const ERD *const *const childrenERDs = infoNode->erd->childrenERDs;
+ const size_t * offsets = infoNode->erd->offsets;
Review Comment:
Yes, you are correct that the offsets should be named childrenOffsets for
consistency. This PR is already very large so I will make that change in
another PR.
##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.h:
##########
@@ -105,7 +111,8 @@ typedef struct HexBinary
typedef struct InfosetBase
{
- const ERD *erd;
+ const ERD * erd;
Review Comment:
Yes, the code still needs the erd pointer to determine which choice branch
an element belongs to. I said above that maybe PState/UState could hold a
mutable erd pointer instead of InfoNode, but I wonder if that would make it
hard for a debugger to point at the start of any InfoNode and display it in a
type-aware manner. You'd have to do an orderly walk starting at the top for
PState/Ustate to have the correct erd pointer, and a debugger might not have a
PState/Ustate?
##########
daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/parsers.c:
##########
@@ -507,14 +507,21 @@ parse_le_uint8(uint8_t *number, size_t num_bits, PState
*pstate)
*number = (uint8_t)integer;
}
-// Parse fill bits until end bitPos0b is reached
+// Parse fill bits up to alignmentInBits or end_bitPos0b
+
+void
+parse_align(size_t alignmentInBits, PState *pstate)
+{
+ size_t end_bitPos0b = (pstate->bitPos0b + alignmentInBits - 1) /
alignmentInBits * alignmentInBits;
Review Comment:
I got the formula from
https://stackoverflow.com/questions/29925524/how-do-i-round-to-the-next-32-bit-alignment
where it was given as (n + k - 1) / k * k. I take it that ((n + k - 1) / k) *
k would be clearer. The comments said that formula would be better than
another formula using the modular division operator (%).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]