I've been asked a Daffodil / DFDL question that I don't know how to answer. The question is:
How to implement a function like get_offset_len(data, schema, field_path) -> (offset, length) ? Do you know a good way (using Daffodil library functions or DFDL constructs) to pass some native data, a DFDL schema, an XPath or DPath expression referring to an element in the DFDL schema, and get the offset and length of that element's field within the native data? Alternatively, does Daffodil have a way to apply a DFDL schema to some native data, construct an infoset from the native data, and list all the elements in the infoset along with their DPath, offset, and length? I searched the Daffodil codebase and wasn't able to find a specific API like that although I may have missed something usable. I scanned the DFDL specification and I did find a DFDL function called "dfdl:contentLength" in section 18.5.3. The function's signature is: dfdl:contentLength($node, $lengthUnits) Returns the length of the supplied node's SimpleContent region for elements of simple type, or ComplexContent region for elements of complex type. These regions are defined in Section 9.2 DFDL Data Syntax Grammar. The value is returned as an xs:unsignedLong. The second argument is of type xs:string and must be 'bytes', 'characters', or 'bits' (Schema Definition Error otherwise) and determines the units of length. Being able to get each element's length looks like it could help although a note in the same section said that the content length returned by dfdl:contentLength() excludes any alignment filling as well as any leading or trailing skip bytes. That is, the returned length tells you about the length of the content, but does not tell you about the position of the content in the native data stream which is what I was asked to find. Nevertheless, if the native data is not text but rather binary data with fixed-size fields, being able to list each content field with its length might be sufficient to deduce the position of each content field as well. I wonder which would be easier to do? 1. Write a Scala program which calls some Daffodil API to parse some native data, construct an infoset from the native data, and list all the elements in the infoset along with their DPath, offset, and length? This would require Daffodil to have an API to iterate over each element in the infoset and return each element's content length. 2. Add DFDL constructs to a DFDL schema which call dfdl:contentLength and dfdl:outputValueCalc to append the same information to the infoset? This would require saving the infoset as XML and writing a program or command to read the information as a list. 3. Another way which I don't know about yet? 4. How would we handle any alignment filling as well as any leading or trailing skip bytes if the DFDL schema uses them? Thanks, John