mbeckerle commented on code in PR #901: URL: https://github.com/apache/daffodil/pull/901#discussion_r1080548480
########## daffodil-runtime2/src/test/resources/org/apache/daffodil/runtime2/variablelen.dfdl.xsd: ########## @@ -0,0 +1,121 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<xs:schema + elementFormDefault="qualified" + xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" + xmlns:xs="http://www.w3.org/2001/XMLSchema"> + + <!-- Binary data format bindings --> + + <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/> + <xs:annotation> + <xs:appinfo source="http://www.ogf.org/dfdl/"> + <dfdl:format + ref="GeneralFormat" + binaryBooleanFalseRep="0" + binaryBooleanTrueRep="1" + fillByte="%NUL;" + prefixIncludesPrefixLength="no" + representation="binary"/> + </xs:appinfo> + </xs:annotation> + + <!-- Types for testing --> + + <!-- Unsuitable since variable length array must allow 0 to 16 numbers --> + <xs:complexType name="fixedType"> + <xs:sequence> + <xs:element name="before" type="xs:unsignedInt" /> + <xs:element name="variablelen_size" type="xs:unsignedInt" /> + <xs:element name="variablelen" type="xs:unsignedInt" + minOccurs="2" maxOccurs="2" + dfdl:occursCountKind="fixed" /> + <!-- Yes, can have "after" element after "fixed" array --> + <xs:element name="after" type="xs:unsignedInt" + minOccurs="2" maxOccurs="2" + dfdl:occursCountKind="fixed" /> + </xs:sequence> + </xs:complexType> + + <!-- Unsuitable since "implicit" works only at end of binary data --> + <xs:complexType name="implicitType"> + <xs:sequence> + <xs:element name="before" type="xs:unsignedInt" /> + <xs:element name="variablelen_size" type="xs:unsignedInt" /> + <xs:element name="variablelen" type="xs:unsignedInt" + minOccurs="0" maxOccurs="16" + dfdl:occursCountKind="implicit" /> + <!-- No, cannot have "after" element after "implicit" array --> + </xs:sequence> + </xs:complexType> + + <!-- Unsuitable since "parsed" works only at end of binary data --> + <xs:complexType name="parsedType"> + <xs:sequence> + <xs:element name="before" type="xs:unsignedInt" /> + <xs:element name="variablelen_size" type="xs:unsignedInt" /> + <xs:element name="variablelen" type="xs:unsignedInt" + minOccurs="0" maxOccurs="16" + dfdl:occursCountKind="parsed" /> + <!-- No, cannot have "after" element after "parsed" array --> + </xs:sequence> + </xs:complexType> + + <!-- Suitable, although need to support variable length arrays in C --> + <xs:complexType name="expressionType"> + <xs:sequence> + <xs:element name="before" type="xs:unsignedInt" /> + <xs:element name="variablelen_size" type="xs:unsignedInt" /> + <xs:element name="variablelen" type="xs:unsignedInt" + minOccurs="0" maxOccurs="16" Review Comment: So the representation you are supporting in the data stream requires the array to have a maxOccurs, and to not exceed it because the Infoset is going to pre-allocate space for the maximum? This makes sense, for now but you need some tunable limit above-which warnings are issued to users. Otherwise users will balk at unbounded, and just change it to 1000000000. Or perhaps it's just a hard limit on the max size? We do need to support actually dynamically sized arrays eventually, but we'll need to discuss how to do an efficient infoset for them. ########## daffodil-runtime2/src/test/c/examples/variablelen/generated_code.h: ########## @@ -0,0 +1,22 @@ +#ifndef GENERATED_CODE_H +#define GENERATED_CODE_H + +// clang-format off +#include <stdbool.h> // for bool +#include <stddef.h> // for size_t +#include <stdint.h> // for uint8_t, int16_t, int32_t, int64_t, uint32_t, int8_t, uint16_t, uint64_t +#include "infoset.h" // for InfosetBase, HexBinary +// clang-format on + +// Define infoset structures + +typedef struct expressionElement_ +{ + InfosetBase _base; + uint32_t before; + uint32_t variablelen_size; + uint32_t variablelen[16]; Review Comment: We're going to eventually need an infosetKind tag field in here to specify which of several representations are being used for this array. I think one should be able to have either directly inline (as you have it here), or a separately allocated structure (for larger arrays) where instead of elements here, this would reference the other object for the array slots. ########## daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c: ########## @@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd) return erd->namedQName.ns; } -// walkInfosetNode - recursively walk an infoset node and call +// walkInfoset - walk each node of an infoset and call +// VisitEventHandler methods + +const Error * +walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode) +{ + const Error *error = handler->visitStartDocument(handler); + + if (!error) + { + error = walkInfosetNode(handler, infoNode); + } + if (!error) + { + error = handler->visitEndDocument(handler); + } + + return error; +} + +// walkInfosetNode - walk each child of an infoset node and call // VisitEventHandler methods static const Error * walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode) { - const size_t count = infoNode->erd->numChildren; - const ERD **const childrenERDs = infoNode->erd->childrenERDs; - const size_t * offsets = infoNode->erd->offsets; + const size_t numChildren = infoNode->erd->numChildren; + const ERD *const *const childrenERDs = infoNode->erd->childrenERDs; + const size_t * offsets = infoNode->erd->offsets; Review Comment: offsets to what? For naming consistency should this be childrenOffsets? Or are these offsets to non-element sub-structures like choices also? If that's the case I'd suggest termOffsets as the name. ########## daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/parsers.c: ########## @@ -507,14 +507,21 @@ parse_le_uint8(uint8_t *number, size_t num_bits, PState *pstate) *number = (uint8_t)integer; } -// Parse fill bits until end bitPos0b is reached +// Parse fill bits up to alignmentInBits or end_bitPos0b + +void +parse_align(size_t alignmentInBits, PState *pstate) +{ + size_t end_bitPos0b = (pstate->bitPos0b + alignmentInBits - 1) / alignmentInBits * alignmentInBits; Review Comment: Please add the implied parentheses here. At first I though you were dividing by alignmentInBits squared. The fact that this works to compute what the ending pos0b should be is not actually obvious. I expected to see modular division operator ( x % y to get the remainder) here to figure out the amount of fill. ########## daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala: ########## @@ -52,13 +53,12 @@ trait BinaryBooleanCodeGenerator extends BinaryValueCodeGenerator { val unparseTrueRep = if (e.binaryBooleanTrueRep.isDefined) s"$trueRep" else s"~$falseRep" val initERDStatement = "" - val initSelfStatement = s" $field = $initialValue;" val parseStatement = - s""" parse_$function(&$field, $lengthInBits, $trueRep, $falseRep, pstate); - | if (pstate->error) return;""".stripMargin + s"""$indent1$indent2 parse_$function(&$field, $lengthInBits, $trueRep, $falseRep, pstate); Review Comment: These templates are getting pretty hard to interpret. really you have to just go look at examples of generated output to see if they make sense. That said, I'm not sure how we do better. Alternative technique is defining little scala abstract syntax tree nodes that represent these things, and we construct a structure of those which then generate the nitty gritty code. That extra indirection to what is exactly being generated gets in the way of understanding just as much as the cluttered business of these templates now. ########## daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/unparsers.c: ########## @@ -401,7 +401,14 @@ unparse_le_uint8(uint8_t number, size_t num_bits, UState *ustate) unparse_endian_uint64(LITTLE_ENDIAN_DATA, number, num_bits, ustate); } -// Unparse fill bits until end bitPos0b is reached +// Unparse fill bits up to alignmentInBits or end_bitPos0b + +void +unparse_align(size_t alignmentInBits, const uint8_t fill_byte, UState *ustate) +{ + size_t end_bitPos0b = (ustate->bitPos0b + alignmentInBits - 1) / alignmentInBits * alignmentInBits; Review Comment: Ditto my comments on the parse align. ########## daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala: ########## @@ -58,38 +56,209 @@ class CodeGeneratorState(private val root: ElementBase) { sb } + // Returns a name for the given element's element resource descriptor + private def erdName(context: ElementBase): String = { + val sb = buildName(context, new StringBuilder) ++= "ERD" + val name = sb.toString + name + } + // Returns a name for the given element's C struct identifier private def cStructName(context: ElementBase): String = { val sb = buildName(context, new StringBuilder) val name = sb.toString name } - // Returns a name for the given element's element resource descriptor - private def erdName(context: ElementBase): String = { - val sb = buildName(context, new StringBuilder) ++= "ERD" - val name = sb.toString - name + // Returns the notation needed to access a C struct field. We make some simplifying + // assumptions to make generating the field access easier: + // - the expression contains only a relative or absolute path, nothing else (e.g., + // the expression doesn't call any functions or perform any computation) + // - we can convert an absolute path to a rootElement()-> indirection + // - we can convert a relative path beginning with up dirs to a parents-> indirection + // - we can convert a relative path without any up dirs to an instance-> indirection + // - we can convert slashes in the path to dots in a C struct field access notation + private def cStructFieldAccess(expr: String): String = { + // Strip any namespace prefixes from expr and the root element's local name since + // C code uses only struct fields' names. + val rootName = root.namedQName.local + val exprPath = expr.replaceAll("/[^/:]+:", "/").stripPrefix(s"/$rootName") + val fieldAccess = if (exprPath.startsWith("/")) { + // Convert exprPath to a rootElement()-> indirection + val C = cStructName(root) + s"""(($C *)rootElement())->${exprPath.stripPrefix("/")}""" + } else if (exprPath.startsWith("../")) { + // Split exprPath into the up dirs and after the up dirs + val afterUpDirs = exprPath.split("\\.\\./").mkString + val upDirs = exprPath.stripSuffix(afterUpDirs) + // Count how many up dirs there are + val nUpDirs = upDirs.split('/').length + // Go up the stack that many times to get that struct's C type + val C = structs(nUpDirs).C + // Convert the up dirs to parents + val parents = upDirs.replaceAllLiterally("../", "parent->").stripSuffix("->") + // Convert exprPath to a parents-> indirection + s"""(($C *)instance->_base.$parents)->$afterUpDirs""" + } else { + // Convert exprPath to an instance-> indirection + s"""instance->$exprPath""" + } + // Finally, convert the field access to C struct dot notation + val notation = fieldAccess.replace('/', '.') + notation } Review Comment: A walker is a *very* good idea. I had been concerned about how to refactor the expression code in daffodil-core and runtime1 because I've been thinking of it as us needing to return a data structure that an alternate runtime can traverse itself. But if we write a walker, then alternate runtimes can use that to walk the structure, and the current objects with all their mixin-intensive organization do not require being refactored. Let's plan to write a walker for expresions and replace just this part of runtime2 in a subsequent PR, even just reproducing the current limited functionality. But it would prove out the walker approach perhaps. ########## daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.c: ########## @@ -115,54 +123,47 @@ get_erd_ns(const ERD *erd) return erd->namedQName.ns; } -// walkInfosetNode - recursively walk an infoset node and call +// walkInfoset - walk each node of an infoset and call +// VisitEventHandler methods + +const Error * +walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode) +{ + const Error *error = handler->visitStartDocument(handler); + + if (!error) + { + error = walkInfosetNode(handler, infoNode); + } + if (!error) + { + error = handler->visitEndDocument(handler); + } + + return error; +} + +// walkInfosetNode - walk each child of an infoset node and call // VisitEventHandler methods static const Error * walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode) { - const size_t count = infoNode->erd->numChildren; - const ERD **const childrenERDs = infoNode->erd->childrenERDs; - const size_t * offsets = infoNode->erd->offsets; + const size_t numChildren = infoNode->erd->numChildren; + const ERD *const *const childrenERDs = infoNode->erd->childrenERDs; + const size_t * offsets = infoNode->erd->offsets; // Start visiting the node const Error *error = handler->visitStartComplex(handler, infoNode); - // Walk the node's children recursively - for (size_t i = 0; i < count && !error; i++) + // Walk each child of the node + for (size_t i = 0; i < numChildren && !error; i++) { - const size_t offset = offsets[i]; const ERD * childERD = childrenERDs[i]; Review Comment: So here you are accessing the child element InfoNode's ERD not by looking at the node, but by looking in your own static ERD metadata. This is cool. But the child pointer below is to an InfoNode and the base InfoBase still begins with a ref to that child element's ERD. Is there a plan to have the children be "ERD-less" as in their ERD is static info of their parent? This is potentially a large space savings as each child object can then be just a pure value really if they are fixed length, and as simple as a length and the data when they are variable length. ########## daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libruntime/infoset.h: ########## @@ -105,7 +111,8 @@ typedef struct HexBinary typedef struct InfosetBase { - const ERD *erd; + const ERD * erd; Review Comment: Do you still use this erd pointer? Is it here for debugging purposes only? I see you are storing a parent pointer here, and given that, one can compute the ERD as follows: Go to the parent, scan through its child offsets to find the offset to this object. The index of that offset in the vector of offsets is the child index. You can then index into the parent's childERDs to the ERD. So really, all you need is the parent pointer at the front of the InfoNode, and you can do a generic walk starting from any InfoNode. That's what you need for a debugger., point at the start of anything and you can display it in a type-aware manner. If walking in orderly fashion starting at the top, you wouldn't need to use this technique, as the more direct walk down works fine. For the root of the infoset I'd suggest create a static dummy parent that exists just so that the logic for finding one's ERD works the same for every InfoNode including the root. One word of overhead per child object is quite excellent. ########## daffodil-runtime2/src/main/resources/org/apache/daffodil/runtime2/c/libcli/daffodil_getopt.c: ########## @@ -85,6 +86,18 @@ parse_daffodil_cli(int argc, char *argv[]) daffodil_unparse.outfile = optarg; break; case 'V': + if (strcmp("limited", optarg) == 0 || strcmp("on", optarg) == 0) + { + daffodil_parse.validate = true; Review Comment: I did not find any example schemas in this change set using fixed="someValue" properties. Is that expected? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
