Re: Simplified DFDL layering/base64 proposal

Mike Beckerle Mon, 09 Apr 2018 07:43:26 -0700

I've updated the VCalendar example to fix the typo, and I've wrapped an 
xs:sequence carrying the dfdl:ref="tns:folded" around the ProdID element.

You are correct to create a resuseable type that includes folding you have to 
use a complex type since only complex types can have an xs:sequence needed to 
carry the layering properties.

This problem is an artifact of the non-uniformity of simple/complex types in 
XSD, and there are lots of places in DFDL like this where you need a complex 
type in order to describe the representation of what ultimately one thinks of 
as a simple value, so you end up with the "value element problem".

This needs a general fix outside the scope of this layering proposal, along the 
lines of allowing a simple type to carry a dfdl:hiddenGroupRef property so that 
the simple element can have a sequence or choice group containing children 
elements to hold the complex representation of that simple type.

Your second observation I think is also correct which is that after running the 
decoding layering algorithm, one might have more data than one needs to satisfy 
the parsing.

When parsing this would be ignored/skipped.

Unparsing is a bit trickier, as this data may need to be provided - e.g., as 
padding - even though it is not carrying any data. It may just be an algorithm 
requirement. We certainly anticipate that data will have to be byte-oriented, 
that is, no final partial byte can be represented. So at least filling the 
final byte out with bits from fillByte may be necessary, but for many 
algorithms the requirement may be that the data is padded/filled to a certain 
byte boundary/alignment. It would be the schema authors responsibility to make 
sure unparsing the data provides a representation to the layering unparser that 
satisfies these requirements.

I will add something to this affect to the proposal page.

________________________________
From: Steve Lawrence <[email protected]>
Sent: Monday, April 9, 2018 10:18:29 AM
To: [email protected]; Mike Beckerle
Subject: Re: Simplified DFDL layering/base64 proposal

I like this simplified version alot! Some questions:

1) In the VCalendar example, ProdID is an element with a dfdl:ref (typo
of dfdl:formatRef) to tns:folded, which contains dfdl:layer* properties.
But layering properties are only allowed on xs:sequence's. I assume this
was just an example from the old proposal that wasn't fixed up, and
should be something like this instead:

  <xs:sequence dfdl:ref="tns:folded">
    <xs:element name="ProdID" type="xs:string" dfdl:initiator="PRODID:"
minOccurs="0"/>
  </xs:sequence>

Which raises a small issue with simple types: This layering transform
now applies to the initiator/terminator of the simple type. If you do
not want a layer to apply to those but only to the value, you'd need
make it a complex type with a "Value" element. I'm not sure this is a
big deal, but layering on simply types might get a little messier in
some cases if the initiators/termiantors shouldn't be transformed.

2) What happens with unused data in an overlying layer. For example:

Say we have something like

  <dfdl:defineFormat name="base64">
    <dfdl:format layerTransform="base64" layerLengthKind="explicit"
                 layerLength="8" ... />
  </dfdl:defineFormat>

  <xs:sequence>
    <xs:sequence dfdl:ref="base64">
      <xs:element name="foo" type="xs:string" dfdl:length="3" />
    <xs:sequence>
    <xs:element name="bar" type="xs:string" dfdl:length="3" />
  </xs:sequence>

Assume the data is this:

  Zm9vWA==bar

The first 8 characters are base64 encoded, and decode to "fooX". The foo
element would only consume three of those characters, so the last "X"
character would be not consumed by foo.

The length of the layer transform was 8 characters, so bar would start
parsing after that and consume the "bar" letters.

So what happens to the unconsumed "X" character? Is it just thrown away?
This seems consistent with how we treat a complex element with explicit
length where the children do not consume the full length. Or is this an
Runtime SDE? Related, on the unparse side, when we base64 decode "foo"
it is only 4 characters, but the layerLength is 8. Are pad characters
inserted to fill that out to 8? Do we need a layerPadCharacter and other
related pad properties?

- Steve

On 04/06/2018 04:10 PM, Mike Beckerle wrote:
> Never mind that one. I've simplified it even further:
>
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+base64+-+Super+Simplified
>
>
> ________________________________
> From: Mike Beckerle
> Sent: Friday, April 6, 2018 3:12:31 PM
> To: [email protected]
> Subject: Simplified DFDL layering/base64 proposal
>
>
> On looking into implementation complexity I've come up with simplifications 
> that don't reduce expressive power at all, but massively simplify 
> implementation (and documentation, and testing...) burdens.
>
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+base64+-+Simplified
>
>
>
>

Re: Simplified DFDL layering/base64 proposal

Reply via email to