[jira] [Commented] (DAFFODIL-2202) Code Gen Framework

Mike Beckerle (Jira) Tue, 16 Feb 2021 08:16:06 -0800


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285316#comment-17285316
 ]


Mike Beckerle commented on DAFFODIL-2202:
-----------------------------------------

With respect to choices, this is an area where the DFDL specification is 
imperfect and we may need sensible extensions to DFDL to cope with unparsing 
and choices with direct dispatch,  particularly in the case of hidden groups. 

Right now the DFDL spec says the dfdl:choiceDispatchKey property is used only 
when parsing.

This may be a mistake.

I think perhaps the c-daffodil backend should evaluate choice-dispatch-key 
expressions at unparse time to choose which branch is being unparsed. This 
expression refers backward-only to pre-existing 'tag' elements.

For now, this puts the burden on the application to properly set the value of 
the element (aka the 'tag' element) responsible for computing the choice 
dispatch key expression. This leaves open the possibility of the 
choiceDispatchKey expression evaluating to a branch key that reflects a choice 
branch that does NOT match the actual infoset being unparsed. This error would 
need to be detected and is an UnparseError.

Based on the last PR code review, I believe the above to be the case now.

In the future when these 'tag' elements are computed by way of 
dfdl:outputValueCalc, those expressions will examine the infoset and determine 
the proper values based on which choice branches are found to exist in the 
infoset. I.e., they would use functions like fn:exists() to test for elements 
using forward-looking expressions.  One could still write those expressions 
incorrectly and have them compute a tag element value that does not correspond 
to the actual infoset. (So an UnparseError is still possible in this unparse 
case exactly like the parse case. )

These dfdl:outputValueCalc properties on the 'tag' element(s) then enables 
dfdl:choiceDispatchKey expressions to be evaluated at unparse time to select 
the choice branch to be unparsed, exactly as at parse time. This would also 
apply to hidden groups containing choices. In hidden groups, the DFDL spec 
currently requires all elements inside any choice branch to have default values 
or outputValueCalc, because the infoset won't have any such elements, so they 
must be computed or defaulted, but this is perhaps just not sufficient. If the 
dfdl:choiceDispatchKey were to be evaluated at unparse time, that would lift 
this restriction, and only the selected choice branch would require this 
behavior.

To enable this evaluation of dfdl:choiceDispatchKey at unparse time we would 
need to add a property. I suggest 
dfdlx:unparseEvaluatesChoiceDispatchKey="no/yes" with "no" as the default for 
backward compatibility.

For runtime 2 we *could* require this property to be "yes" or it's an SDE.  
This would, however, make runtime 2 incompatible with runtime 1 until this 
capability was also implemented for runtime 1.

 

> Code Gen Framework
> ------------------
>
>                 Key: DAFFODIL-2202
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2202
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Back End
>    Affects Versions: 2.4.0
>            Reporter: Mike Beckerle
>            Assignee: John Interrante
>            Priority: Minor
>
> We have built an initial C code generator backend for Apache Daffodil. 
> Currently the C code generator can generate C code to read and write binary 
> real and integer numbers, arrays of such numbers, and choices of alternative 
> structures within wire protocol packets. We plan to continue building out the 
> C code generator until it supports a minimal subset of the DFDL 1.0 
> specification for embedded devices.
> Here are some notes to keep track of changes that have been requested by 
> collaborators or reviewers so we don't forget them. If someone wants to help 
> (which would be appreciated), please add a comment to this issue or let the 
> dev list know in order to avoid duplication.
> 1. Validation of "fixed" values
> Is there a way for us to find a fixed="value" attribute in a schema within 
> runtime2 so we can generate C code to check that the corresponding C struct 
> member has the matching value? Suppose a schema has
> {code:xml}
>   <xs:complexType name="Limits">
>     <xs:sequence>
>       <xs:element name="sync0" fixed="210" type="idl:uint8"/>
>       <xs:element name="checksum" type="idl:uint16"/>
>     </xs:sequence>
>   </xs:complexType>
>   <xs:element name="LimitsDecl" type="idl:Limits"/>
> </xs:schema>
> {code}
> and a binary data file does not have the number 210 in sync0's position, we 
> would want the generated C code to report an error like:
> {noformat}
> Validation error: The value of element 'sync0' does not match the value of 
> its 'fixed' attribute.
> {noformat}
> 2. C struct/field name collisions
> To avoid possible name collisions, we should prepend struct names and field 
> names with namespace prefixes if their infoset elements have non-null 
> namespace prefixes.
> 3. Anonymous/multiple choice groups
> In addition to handling elements with xs:choice complex types, we should 
> detect anonymous choice groups and refine the choice runtime structure in 
> order to allow multiple choice groups to be inlined into parent elements. 
> Example schema and corresponding C code:
> {code:xml}
>   <xs:complexType name="NestedUnionType">
>     <xs:sequence>
>       <xs:element name="first_tag" type="idl:int32"/>
>       <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
>         <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
>         <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
>       </xs:choice>
>       <xs:element name="second_tag" type="idl:int32"/>
>       <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
>         <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
>         <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
>       </xs:choice>
>     </xs:sequence>
>   </xs:complexType>
> {code}
> {code:c}
> typedef struct NestedUnion
> {
>     InfosetBase _base;
>     int32_t     first_tag;
>     size_t      _choice_1; // choice of which union field to use
>     union
>     {
>         foo foo;
>         bar bar;
>     };
>     int32_t     second_tag;
>     size_t      _choice_2; // choice of which union field to use
>     union
>     {
>         fie fie;
>         fum fum;
>     };
> } NestedUnion;
> {code}
> 4. Choice dispatch key expressions
> We currently support only very restricted and simple subset of choice 
> dispatch key expressions. We would like to refactor the DPath expression 
> compiler and make it generate C code in order to support more kinds of choice 
> dispatch key expressions.
> 5. No match between choice dispatch key and choice branch keys
> Right now c-daffodil is more strict than scala-daffodil when unparsing 
> infoset XML files with no matches (or mismatches) between choice dispatch 
> keys and branch keys. Perhaps c-daffodil should load such an XML file without 
> a no match processing error and unparse the infoset to a binary data file 
> without a no match processing error. We would have to code and call a choice 
> branch resolver in C which peeks at the next XML element, figures out which 
> branch does that element indicate exists inside the choice group, and 
> initializes the choice and element runtime data (_choice and childNode->erd 
> member fields) accordingly. We probably would replace the initChoice() call 
> in walkInfosetNode() with a call to that choice branch resolver and we might 
> not need to call initChoice() in unparseSelf(). When I called initChoice() in 
> all these parse, walk, and unparse places, I was pondering removing the 
> _choice member field and calling initChoice() as a function to tell us which 
> element to visit next, but we probably should have a mutable choice runtime 
> data structure.
> 6. Floating point numbers
> Right now runtime2 prints floating point numbers in XML infoset files 
> slightly differently than runtime1 does. This means TDML tests may need to 
> use different XML infoset files for different runtimes. We should be able to 
> make the TDML Runner compare floating point numbers numerically, not 
> textually, so that TDML tests won't have to use two different XML infoset 
> files.
> 7. Arrays
> Instead of expanding arrays inline within childrenERDs, we may want to store 
> a single entry for an array in childrenERDs giving the array's offset and 
> size of all its elements. We would have to write code for special case 
> treatment of array member fields versus scalar member fields but we could 
> save space/memory in childrenERDs for use cases with very large arrays. An 
> array element's ERD should have minOccurs and maxOccurs where minOccurs is 
> unsigned and maxOccurs is signed with -1 meaning "unbounded". The actual 
> number of children in an array instance would have to be stored in the array 
> instance object (where, in the C struct or what?). An array node has to be a 
> different kind of infoset node with a place for this number of actual 
> children to be stored. Probably all ERDs should just get minOccurs and 
> maxOccurs and a scalar is just one with 1, 1 as those values, an optional 
> element is 0,1, and an array is all other legal combinations. N, -1 and N, M 
> with N<=M. A restriction that minOccurs is 0, 1, or equal to maxOccurs (which 
> is not -1) is acceptable. A restriction that maxOccurs is 1, -1, or equal to 
> minOccurs is also fine (means variable-length arrays always have unbounded 
> number of elements.)
> 8. Daffodil module/subdirectory names
> When Daffodil is ready to move from a 3.x to a 4.x release, rename the 
> modules to have shorter and easier to understand names as discussed in 
> DAFFODIL-2406.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DAFFODIL-2202) Code Gen Framework

Reply via email to