tuxji opened a new pull request, #901:
URL: https://github.com/apache/daffodil/pull/901
Extend the C code generator to support more DFDL features such as arrays
with occursCountKind="expression" and padding variable length elements to
multiples of 4 bytes by following them with alignment="4" elements. Currently
the generator handles only finite inlined arrays, not unbounded arrays or
arrays with dynamically allocated memory. An array's declaration always uses
its maxOccurs attribute as its size and uses that same amount of static memory
space regardless of how many values the data file actually puts into the array
element.
Note the new support for padding variable length elements is rudimentary,
barely handles even my specific alignment="4" use case, and needs reviewer
suggestions to improve it (see below).
DAFFODIL-2762
grammar/AlignedMixin.scala: Allow C code generator to call
priorAlignmentApprox and endingAlignmentApprox to find out whether a schema's
data format may require fill bytes between two elements. WOULD APPRECIATE
ADVICE whether there is a better way than interrogating these values go since
Daffodil seems to optimize out and merge empty sequences with alignment="4"
attributes into adjacent elements' priorAlignmentApprox and
endingAlignmentApprox by the time the code generator inspects the compiled
schema elements. Also see calling code at line 596 in CodeGeneratorState.scala.
c/.clang-format: Remove statement macros no longer used by C code.
c/Makefile: Allow user to override name of test data file (TEST) on command
line in order to to check multiple test data files in place within the same
directory. Rename two goals for better clarity (check -> test, test -> tests).
cli_errors.[ch]: Add new CLI_DIAGNOSTICS error message used by daffodil_main
and new CLI_INVALID_VALIDATE error message used by daffodil_getopts when
validation mode is turned on. In usage message, move old -V option (print
program version) to -v and add new -V option (validate mode to choose).
daffodil_getopt.[ch]: Add bool validate to daffodil_parse_cli options.
Initialize validate to false by default and allow both -V limited and -V on to
set validate true (also accept and ignore -V off).
daffodil_main.c: If validate mode is on, any diagnostics will fail the parse
and exit with CLI_DIAGNOSTICS, making daffodilC's behavior more similar to
daffodil's behavior.
xml_reader.c: Expand if statement in xmlStartComplex for easier
breakpointing and debuggability.
xml_writer.c: Make header changes suggested by make iwyu.
xml_writer.h: Correct indentation fixed by make format.
errors.[ch]: Add new ERR_ARRAY_BOUNDS error message used by
parse_check_bounds and unparse_check_bounds functions to ensure C code doesn't
read or write outside static arrays' bounds.
infoset.c: Reorder and split infoset walk functions into walkInfoset,
walkInfosetNode, walkInfosetNodeChild, and walkArray for better modularity and
readability. An infoset walk visits each element with a large switch statement
in `walkInfosetNodeChild' and visits each array of elements with a getArraySize
call followed by a for-loop statement in `walkArray'.
infoset.h: Make initChoice take only 1 argument and add getArraySize with
only 1 argument too. Add ARRAY enum to TypeCode, getArraySize function pointer
to struct ERD, and parent pointer to struct InfosetBase to allow an infoset
walk to detect arrays, get their array sizes from preceding elements, and walk
the arrays.
p_endian.h: Reorder headers since make format changed their order.
parsers.[ch], unparsers.[ch]: Add parse_check_bounds and
unparse_check_bounds to avoid reading or writing outside an array's
minOccurs/maxOccurs bounds. Daffodil's Scala backend can handle any array size
regardless of minOccurs/maxOccurs, but C code can't handle any array size yet
since its C arrays use only static memory space.
CodeGenerator.scala: Also generate padtest and variablelen examples every
time sbt compile is run.
BinaryBooleanCodeGenerator.scala, BinaryFloatCodeGenerator.scala,
BinaryIntegerKnownLengthCodeGenerator.scala,
BinaryValueCodeGenerator.scala, HexBinaryCodeGenerator.scala: Remove
initialValue and initSelf statements since C code now calls only initERD when
reading a binary data file or XML infoset file into memory. Calling both
initERD and initSelf occasionally overwrites fields of elements inside choices
at the wrong moment; ensuring initERD skips choices and initChoice calls only
the correct sub-initERD for choices prevents overwriting any fields. Calculate
correct indentation for statements in array for-loops and choice switch
statements.
CodeGeneratorState.scala: Use cStructFieldAccess, choiceDispatchField, and
getOccursCount to turn a DPath expression from choiceDispatchKey="{expression}"
or occursCount="{expression}" into a C struct field access dot notation.
Instead of turning schema components' lexical parents' names into slash paths,
concatenating a DPath expression as an URI, and normalizing the URI to get the
resulting path, cStructFieldAccess now turns up-paths (../) into parent pointer
dereferences and casts the last parent pointer to the correct type by indexing
into the stack of structs to get the corresponding struct's C name.
Dereferencing parent pointers the appropriate number of times seems to be the
most robust way to handle elements nested inside arrays correctly. Generate
arrays' ERD, offsets, childrenERDs, initERD, parseSelf, unparseSelf, and
getArraySize in new addArrayImplementation method. Remove all generation or
calls of initSelf statements and pass both ERD and parent to all initERD fu
nctions to initialize all elements' ERD and parent pointers. I think we need
parent pointers to handle elements inside arrays and we also would find ERD
pointers very difficult to remove because initChoice calls initERD at runtime
to tell elements inside choices what kind of element they are. Since we're
removing initSelf, stop setting _choice fields to 0x777... (allow them to start
off zero) and use 1-based choices instead of 0-based choices. Add new
arrayMaxOccurs and hasChoice methods to make it easier for callers to detect
the presence of arrays and choices. Rename method addComplexTypeStatements to
addPerChildStatements and add new methods pushArray and popArray to generate C
statements for any child element taking choices and arrays into account.
ElementParseAndUnspecifiedLengthCodeGenerator.scala: Simplify and refactor
code to generate C code for child elements using fewer CodeGeneratorState
method calls such as addPerChildStatements.
examples/*/generated_code.[ch]: Regenerate examples of generated C code.
Observe how special ERD initializations now happen for arrays, how initERD
functions store both ERD and parent pointers, how only truly necessary
HexBinary initialization are moved to initERD functions and all unnecessary
initializations disappear with the removal of initSelf functions, how
initChoice and getArraySize functions dereference parent pointers to get
dispatch keys and array sizes, how initChoice switch statements use 1-based
cases instead of 0-based cases, how parseSelf and unparseSelf fill padding, and
so on.
runtime2/{*.dat,*dat.xml,*.tdml}: Move existing data/infoset files into
`data' and `infosets' subdirectories, and add new data/infoset files to test
padded hexBinary elements and variable length arrays.
runtime2/ex_nums.tdml: Improve comments and test cases to provide more
commonality between testing daffodil and daffodilC.
runtime2/padtest.{dfdl.xsd,tdml}: Add structures and test cases to
demonstrate padding hexBinary elements to multiples of 4 bytes.
runtime2/variablelen.{dfdl.xsd,tdml}: Add structures and test cases for
different representations of arrays; daffodilC now handles two of these
representations (occursCountKind="fixed" and "expression") but not "implicit,"
"parsed," or "stopValue" which probably don't occur in binary protocols.
TestExNums.scala: Fix IDE warnings and add more test cases.
TestPadTest.scala: Run padded hexBinary test cases with both daffodil and
daffodilC.
TestVariableLen.scala: Run variable length array test cases with both
daffodil and daffodilC.
Rat.scala: Ignore entire runtime2/data subdirectory instead of ignoring each
data file. We have more data files that can be put into data subdirectories
and removed from Rat.scala, but do that later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]