[ 
https://issues.apache.org/jira/browse/DAFFODIL-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Beckerle updated DAFFODIL-2722:
------------------------------------
    Priority: Minor  (was: Major)

> Add new dfdl:lengthKind 'dfdlx:patternMatch'
> --------------------------------------------
>
>                 Key: DAFFODIL-2722
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2722
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: Back End, Diagnostics, Front End
>    Affects Versions: 3.3.0
>            Reporter: Mike Beckerle
>            Priority: Minor
>
> I've run into the problem with lengthKind 'pattern' where no-match just 
> silently returns 0 length many times now. 
> I've finally run out of patience with it. 
> Consider the idiom used in mil-std-2045 and other related standards for 
> variable length strings with a max length. These use a convention where if 
> the max length is used, no terminator character follows. But if less than the 
> max are used, a DEL character is used as the terminator.
> So, consider a zero-length string. This appears in the data stream as just a 
> DEL character.
> The standard idiom for a length 20 string would be this:
>   
> {code:java}
> <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern" 
> dfdl:lengthPattern="[^\x7F]{0,19}(?=\x7F)|.{20}">
>         <xs:simpleType>
>           <xs:restriction base="xs:string">
>             <xs:maxLength value="20"/>
>           </xs:restriction>
>         <xs:simpleType>
>       </xs:element>
>       <xs:sequence dfdl:terminator="{if (fn:string-length(./value) eq 20) 
> then '%ES;' else '%DEL;'}"/>{code}
>  
> Now consider if this is encountered near end of file, and there is no DEL 
> found, neither are there 20 characters. The data is short.
> However, DFDL gives us no way to tell the difference between this and the 
> situation where the data stream did in fact contain just a DEL to terminate a 
> zero-length string.
> In both cases we get a successful parse of the element named 'value'. 
> However, in the short data case, the terminator will then not be found and a 
> parse error will be issued indicating terminator not found.
> This is ok, but really we would get a better diagnostic if the element did 
> not even pattern match successfully because we found no DEL nor 20 
> characters. 
> When you look at the alternatives to improve this, one thing comes to mind:
> We add another assert at the start of the group, which uses a dfdl:assert 
> with testKind pattern to detect if enough data is present to parse the field. 
> This works, but it is going through matching the regex TWICE. The first regex 
> match is purely so we can tell apart the no-match case from the zero-length 
> match case. 
> It works, but feels very heroic, as in way too complex. 
> {code:java}
> <xs:sequence>
>         <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/";>
>           <dfdl:assert testKind='pattern'
>              message="String not found. Neither DEL terminator, nor 20 
> characters could be parsed."
>              testPattern="[^\x7F]{0,19}(?=\x7F)|.{20}"/>
>         </xs:appinfo></xs:annotation>
>       </xs:sequence>
>       <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern" 
> dfdl:lengthPattern="[^\x7F]{0,19}(?=\x7F)|.{20}">
>         <xs:simpleType>
>           <xs:restriction base="xs:string">
>             <xs:maxLength value="20"/>
>           </xs:restriction>
>         <xs:simpleType>
>       </xs:element>
>       <xs:sequence dfdl:terminator="{if (fn:string-length(./value) eq 20) 
> then '%ES;' else '%DEL;'}"/>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to