[GitHub] [daffodil] tuxji opened a new pull request, #901: Add more daffodilC support (arrays, alignment)

GitBox Wed, 28 Dec 2022 14:35:52 -0800


tuxji opened a new pull request, #901:
URL: https://github.com/apache/daffodil/pull/901


   Extend the C code generator to support more DFDL features such as arrays 
with occursCountKind="expression" and padding variable length elements to 
multiples of 4 bytes by following them with alignment="4" elements.  Currently 
the generator handles only finite inlined arrays, not unbounded arrays or 
arrays with dynamically allocated memory.  An array's declaration always uses 
its maxOccurs attribute as its size and uses that same amount of static memory 
space regardless of how many values the data file actually puts into the array 
element.
   
   Note the new support for padding variable length elements is rudimentary, 
barely handles even my specific alignment="4" use case, and needs reviewer 
suggestions to improve it (see below).
   
   DAFFODIL-2762
   
   grammar/AlignedMixin.scala: Allow C code generator to call 
priorAlignmentApprox and endingAlignmentApprox to find out whether a schema's 
data format may require fill bytes between two elements. WOULD APPRECIATE 
ADVICE whether there is a better way than interrogating these values go since 
Daffodil seems to optimize out and merge empty sequences with alignment="4" 
attributes into adjacent elements' priorAlignmentApprox and 
endingAlignmentApprox by the time the code generator inspects the compiled 
schema elements.  Also see calling code at line 596 in CodeGeneratorState.scala.
   
   c/.clang-format: Remove statement macros no longer used by C code.
   
   c/Makefile: Allow user to override name of test data file (TEST) on command 
line in order to to check multiple test data files in place within the same 
directory.  Rename two goals for better clarity (check -> test, test -> tests).
   
   cli_errors.[ch]: Add new CLI_DIAGNOSTICS error message used by daffodil_main 
and new CLI_INVALID_VALIDATE error message used by daffodil_getopts when 
validation mode is turned on. In usage message, move old -V option (print 
program version) to -v and add new -V option (validate mode to choose).
   
   daffodil_getopt.[ch]: Add bool validate to daffodil_parse_cli options. 
Initialize validate to false by default and allow both -V limited and -V on to 
set validate true (also accept and ignore -V off).
   
   daffodil_main.c: If validate mode is on, any diagnostics will fail the parse 
and exit with CLI_DIAGNOSTICS, making daffodilC's behavior more similar to 
daffodil's behavior.
   
   xml_reader.c: Expand if statement in xmlStartComplex for easier 
breakpointing and debuggability.
   
   xml_writer.c: Make header changes suggested by make iwyu.
   
   xml_writer.h: Correct indentation fixed by make format.
   
   errors.[ch]: Add new ERR_ARRAY_BOUNDS error message used by 
parse_check_bounds and unparse_check_bounds functions to ensure C code doesn't 
read or write outside static arrays' bounds.
   
   infoset.c: Reorder and split infoset walk functions into walkInfoset, 
walkInfosetNode, walkInfosetNodeChild, and walkArray for better modularity and 
readability.  An infoset walk visits each element with a large switch statement 
in `walkInfosetNodeChild' and visits each array of elements with a getArraySize 
call followed by a for-loop statement in `walkArray'.
   
   infoset.h: Make initChoice take only 1 argument and add getArraySize with 
only 1 argument too.  Add ARRAY enum to TypeCode, getArraySize function pointer 
to struct ERD, and parent pointer to struct InfosetBase to allow an infoset 
walk to detect arrays, get their array sizes from preceding elements, and walk 
the arrays.
   
   p_endian.h: Reorder headers since make format changed their order.
   
   parsers.[ch], unparsers.[ch]: Add parse_check_bounds and 
unparse_check_bounds to avoid reading or writing outside an array's 
minOccurs/maxOccurs bounds.  Daffodil's Scala backend can handle any array size 
regardless of minOccurs/maxOccurs, but C code can't handle any array size yet 
since its C arrays use only static memory space.
   
   CodeGenerator.scala: Also generate padtest and variablelen examples every 
time sbt compile is run.
   
   BinaryBooleanCodeGenerator.scala, BinaryFloatCodeGenerator.scala, 
BinaryIntegerKnownLengthCodeGenerator.scala,
   BinaryValueCodeGenerator.scala, HexBinaryCodeGenerator.scala: Remove 
initialValue and initSelf statements since C code now calls only initERD when 
reading a binary data file or XML infoset file into memory.  Calling both 
initERD and initSelf occasionally overwrites fields of elements inside choices 
at the wrong moment; ensuring initERD skips choices and initChoice calls only 
the correct sub-initERD for choices prevents overwriting any fields.  Calculate 
correct indentation for statements in array for-loops and choice switch 
statements.
   
   CodeGeneratorState.scala: Use cStructFieldAccess, choiceDispatchField, and 
getOccursCount to turn a DPath expression from choiceDispatchKey="{expression}" 
or occursCount="{expression}" into a C struct field access dot notation.  
Instead of turning schema components' lexical parents' names into slash paths, 
concatenating a DPath expression as an URI, and normalizing the URI to get the 
resulting path, cStructFieldAccess now turns up-paths (../) into parent pointer 
dereferences and casts the last parent pointer to the correct type by indexing 
into the stack of structs to get the corresponding struct's C name.  
Dereferencing parent pointers the appropriate number of times seems to be the 
most robust way to handle elements nested inside arrays correctly.  Generate 
arrays' ERD, offsets, childrenERDs, initERD, parseSelf, unparseSelf, and 
getArraySize in new addArrayImplementation method.  Remove all generation or 
calls of initSelf statements and pass both ERD and parent to all initERD fu
 nctions to initialize all elements' ERD and parent pointers.  I think we need 
parent pointers to handle elements inside arrays and we also would find ERD 
pointers very difficult to remove because initChoice calls initERD at runtime 
to tell elements inside choices what kind of element they are.  Since we're 
removing initSelf, stop setting _choice fields to 0x777... (allow them to start 
off zero) and use 1-based choices instead of 0-based choices.  Add new 
arrayMaxOccurs and hasChoice methods to make it easier for callers to detect 
the presence of arrays and choices.  Rename method addComplexTypeStatements to 
addPerChildStatements and add new methods pushArray and popArray to generate C 
statements for any child element taking choices and arrays into account.
   
   ElementParseAndUnspecifiedLengthCodeGenerator.scala: Simplify and refactor 
code to generate C code for child elements using fewer CodeGeneratorState 
method calls such as addPerChildStatements.
   
   examples/*/generated_code.[ch]: Regenerate examples of generated C code.  
Observe how special ERD initializations now happen for arrays, how initERD 
functions store both ERD and parent pointers, how only truly necessary 
HexBinary initialization are moved to initERD functions and all unnecessary 
initializations disappear with the removal of initSelf functions, how 
initChoice and getArraySize functions dereference parent pointers to get 
dispatch keys and array sizes, how initChoice switch statements use 1-based 
cases instead of 0-based cases, how parseSelf and unparseSelf fill padding, and 
so on.
   
   runtime2/{*.dat,*dat.xml,*.tdml}: Move existing data/infoset files into 
`data' and `infosets' subdirectories, and add new data/infoset files to test 
padded hexBinary elements and variable length arrays.
   
   runtime2/ex_nums.tdml: Improve comments and test cases to provide more 
commonality between testing daffodil and daffodilC.
   
   runtime2/padtest.{dfdl.xsd,tdml}: Add structures and test cases to 
demonstrate padding hexBinary elements to multiples of 4 bytes.
   
   runtime2/variablelen.{dfdl.xsd,tdml}: Add structures and test cases for 
different representations of arrays; daffodilC now handles two of these 
representations (occursCountKind="fixed" and "expression") but not "implicit," 
"parsed," or "stopValue" which probably don't occur in binary protocols.
   
   TestExNums.scala: Fix IDE warnings and add more test cases.
   
   TestPadTest.scala: Run padded hexBinary test cases with both daffodil and 
daffodilC.
   
   TestVariableLen.scala: Run variable length array test cases with both 
daffodil and daffodilC.
   
   Rat.scala: Ignore entire runtime2/data subdirectory instead of ignoring each 
data file.  We have more data files that can be put into data subdirectories 
and removed from Rat.scala, but do that later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] tuxji opened a new pull request, #901: Add more daffodilC support (arrays, alignment)

Reply via email to