[
https://issues.apache.org/jira/browse/DAFFODIL-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Beckerle updated DAFFODIL-2722:
------------------------------------
Priority: Minor (was: Major)
> Add new dfdl:lengthKind 'dfdlx:patternMatch'
> --------------------------------------------
>
> Key: DAFFODIL-2722
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2722
> Project: Daffodil
> Issue Type: New Feature
> Components: Back End, Diagnostics, Front End
> Affects Versions: 3.3.0
> Reporter: Mike Beckerle
> Priority: Minor
>
> I've run into the problem with lengthKind 'pattern' where no-match just
> silently returns 0 length many times now.
> I've finally run out of patience with it.
> Consider the idiom used in mil-std-2045 and other related standards for
> variable length strings with a max length. These use a convention where if
> the max length is used, no terminator character follows. But if less than the
> max are used, a DEL character is used as the terminator.
> So, consider a zero-length string. This appears in the data stream as just a
> DEL character.
> The standard idiom for a length 20 string would be this:
>
> {code:java}
> <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
> dfdl:lengthPattern="[^\x7F]{0,19}(?=\x7F)|.{20}">
> <xs:simpleType>
> <xs:restriction base="xs:string">
> <xs:maxLength value="20"/>
> </xs:restriction>
> <xs:simpleType>
> </xs:element>
> <xs:sequence dfdl:terminator="{if (fn:string-length(./value) eq 20)
> then '%ES;' else '%DEL;'}"/>{code}
>
> Now consider if this is encountered near end of file, and there is no DEL
> found, neither are there 20 characters. The data is short.
> However, DFDL gives us no way to tell the difference between this and the
> situation where the data stream did in fact contain just a DEL to terminate a
> zero-length string.
> In both cases we get a successful parse of the element named 'value'.
> However, in the short data case, the terminator will then not be found and a
> parse error will be issued indicating terminator not found.
> This is ok, but really we would get a better diagnostic if the element did
> not even pattern match successfully because we found no DEL nor 20
> characters.
> When you look at the alternatives to improve this, one thing comes to mind:
> We add another assert at the start of the group, which uses a dfdl:assert
> with testKind pattern to detect if enough data is present to parse the field.
> This works, but it is going through matching the regex TWICE. The first regex
> match is purely so we can tell apart the no-match case from the zero-length
> match case.
> It works, but feels very heroic, as in way too complex.
> {code:java}
> <xs:sequence>
> <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
> <dfdl:assert testKind='pattern'
> message="String not found. Neither DEL terminator, nor 20
> characters could be parsed."
> testPattern="[^\x7F]{0,19}(?=\x7F)|.{20}"/>
> </xs:appinfo></xs:annotation>
> </xs:sequence>
> <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern"
> dfdl:lengthPattern="[^\x7F]{0,19}(?=\x7F)|.{20}">
> <xs:simpleType>
> <xs:restriction base="xs:string">
> <xs:maxLength value="20"/>
> </xs:restriction>
> <xs:simpleType>
> </xs:element>
> <xs:sequence dfdl:terminator="{if (fn:string-length(./value) eq 20)
> then '%ES;' else '%DEL;'}"/>{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)