[ 
https://issues.apache.org/jira/browse/DAFFODIL-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Lawrence updated DAFFODIL-1598:
-------------------------------------
    Fix Version/s: 4.2.0
                       (was: 4.1.0)

> Unparser: For strings that truncate, the dfdl:valueLength function cannot 
> suspend
> ---------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-1598
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1598
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End, General
>            Reporter: Mike Beckerle
>            Assignee: Olabusayo Kilo
>            Priority: Major
>             Fix For: 4.2.0
>
>
> When unparsing, the strategy used to determine the target length for an 
> element is to determine the value length by allowing the unparsing to go 
> forward but into a buffering data output stream. The value length is 
> determined by capturing the starting position, the ending position (both of 
> which are in the buffered output), and subtracting. 
> However, if the string is a truncated string (lengthKind 'explicit' or 
> 'implicit', with dfdl:truncateSpecifiedLengthString 'yes'), we have a cycle 
> because the value-length of an element is the post-truncation length, yet to 
> determine the target length we will often need to know the value-length.
> For example: 
> {code}
> <xs:element name="len" type="xs:int" dfdl:outputValueCalc="{ 
>   if (dfdl:valueLength(../data) lt 100) then 100 else 
> dfdl:valueLength(../data) 
> }" />
> <xs:element name="data" type="xs:string" dfdl:lengthKind='explicit' 
> dfdl:length='{ ../len }'
>    dfdl:truncateSpecifiedLengthString='yes'/>
> {code}
> In the above, the length expression depends on the 'len' element value. The 
> len element value requires the valueLength of the 'data' element. Without 
> truncation, this would work, as we could unparse the value of the 'data' into 
> a buffering data output stream, and measure its length, and that would 
> unblock the suspended 'len' element's dfdl:outputValueCalc expression, which 
> would allow the 'data' element's length expression to be evaluated, and we 
> would then know how much padding/fill to add to the 'data' element 
> representation.
> But if the 'data' string can be truncated, as in the example above, then this 
> fails, because we can't unparse it and allow the value length to be derived 
> from the unparsed representation in a  buffer, since the valueLength is 
> supposed to be the post-truncation value.
> So a cyclic deadlock will occur unparsing things like the above. 
> The question is: is this a problem? The above example could be re-written as:
> {code}
> <xs:element name="len" type="xs:int" dfdl:outputValueCalc="{ 
>   if (fn:string-length(../data) lt 100) then 100 else 
> fn:string-length(../data) 
> }" />
> <xs:element name="data" type="xs:string" dfdl:lengthKind='explicit' 
> dfdl:length='{ ../len }'
>    dfdl:truncateSpecifiedLengthString='yes'/>
> {code}
> The fn:string-length function provides the pre-truncation length of the 
> element. This elminates the cycle.
> We can detect this error at runtime, as the ElementRuntimeData structure 
> contains optTruncateSpecifiedLengthString, which can be examined at runtime 
> by the valueLength function, which can error that dfdl:valueLength is being 
> called on a truncated string, and the diagnostic can suggest that 
> fn:string-length is preferable.
> However, it's not an error to call dfdl:valueLength on a string that may be 
> truncated. It's only an error to do so in a way that creates this deadlock. 
> The DFDL spec does not preclude calling dfdl:valueLength on a string element 
> that might be truncated, and the spec is clear that this would be the 
> post-truncation value-region length. 
> So we need a mechanism where we can produce a runtime SDE, not any time 
> dfdl:valueLength is called on a string that might be truncated, but a 
> mechanism where we examine the deadlocked cycle, and we see if it is caused 
> by taking dfdl:valueLength of a string that might be truncated, so we can 
> issue the runtime SDE about this particular cyclic definition problem only in 
> the cases where it is actually creating a cycle.
> That would be ideal, but an interim acceptable solution might be:
> (a) have tutorials about cycles and include this example 
> (b) disallow this at schema compile time as something daffodil just doesn't 
> allow - along with the fix which is to use fn:string-length instead.
> or both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to