Ah. I'm glad you found this workaround.

This technique you are using is something I have generally called "Modeling
Syntax As Data".  Sometimes this is the best way to model this sort of
data. It is a very powerful technique.

As you have no-doubt noticed, you can't push these "syntax elements" down
into hidden groups. The whole point of them is they have to appear in the
infoset so as to express the information needed to properly preserve
aspects of the data.

Whitespace preservation is the most common place I've seen it needed.
Prefixed length binary integers are another example.


On Thu, Mar 24, 2022 at 1:22 PM Attila Horvath <[email protected]>
wrote:

> ALCON
>
>  Appreciate the following suggested workaround:
>
> <xs:element name="satellite-num-range" type="xs:unsignedInt"
> dfdl:lengthKind="explicit" dfdl:length="5"
>   dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar"
> dfdl:textNumberPadCharacter="%SP;" dfdl:textNumberJustification="right"
>   dfdl:textNumberPattern="####0"/>
>
> The snippet above on implementation does work but doesn’t yield lossless
> unparse results which is still our goal.
>
> After [much] trial/error I came up w/ alternative approach which deals
> with leading whitespace/zeros but produces lossless parse/unparsed results:
> [image: image.png]
>
> The snipped above 1st instantiates an element of leading whitespace(s) on
> line #134.
>
> Given the satellite number is fixed length of 5 characters, the hidden
> group isolates the numerical digits based on the number of whitespace
> characters and instantiates an element ‘satellite-num-range’ of variable
> length.
>
> This allows lossless string processing and the ability to convert string
> to unsignedInt and avoid conversion error due to leading whitespace(s).
>
>
>
> Attila
>
>
> PS: if you can't see embedded image above, see attached PDF.
>
>
> On Wed, Mar 16, 2022 at 12:21 PM Mike Beckerle <[email protected]>
> wrote:
>
>> Ok, I found the attachment. Sorry for the delay.
>>
>> The challenge here is you are thinking the
>> xs:unsignedInt(../Line1.02-Satellite) call will tolerate whitespace. Which
>> it seems they do not.
>>
>> I think this is a Daffodil bug, as the constructors like xs:unsignedInt
>> are
>> supposed to work like they do in XPath, and the XPath functions spec says
>> when converting from strings, that whitespace normalization applies -
>> which
>> trims all leading and trailing whitespace. It's less clear about whether
>> interior whitespace is collapsed, but definitely leading/trailing seem to
>> be trimmed.
>>
>> So I'll add a JIRA ticket about this.
>>
>> For how to work around, I suggest parsing the satellite field not as a
>> string, but as an unsignedInt from the start.
>>
>> So like:
>>
>> <xs:element name="satellite-num-range" type="xs:unsignedInt"
>> dfdl:lengthKind="explicit" dfdl:length="5"
>>   dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar"
>> dfdl:textNumberPadCharacter="%SP;" dfdl:textNumberJustification="right"
>>   dfdl:textNumberPattern="####0"/>
>>
>> I didn't run this, but I think this will remove leading spaces, and add
>> leading spaces to your 5 character element.
>>
>> Another way to express this, since you need only leading padding is this:
>>
>> <xs:element name="satellite-num-range" type="xs:unsignedInt"
>> dfdl:lengthKind="explicit" dfdl:length="5"
>>   dfdl:textNumberPattern="* ####0"/>
>>
>> In that textNumberPattern the "* " means spaces are the pad character to
>> be
>> used, and when there is no digit for the position of a "#" then the pad
>> character from the pattern (not the textNumberPadCharacter) is used.
>>
>> Both kinds of padding can be used together E.g., so you could have number
>> text right justified in a fixed-length field of width 6, using "*" to pad
>> to width 5 so that you can get " **123".
>>
>> <xs:element name="starPadNum" type="xs:unsignedInt"
>> dfdl:lengthKind="explicit" dfdl:length="6"
>>   dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar"
>> dfdl:textNumberPadCharacter="%SP;" dfdl:textNumberJustification="right"
>>   dfdl:textNumberPattern="* ####0"/>
>>
>> I didn't run these, but this is, I believe, how it is supposed to work.
>>
>>
>>
>> On Tue, Mar 15, 2022 at 5:23 PM Attila Horvath <
>> [email protected]>
>> wrote:
>>
>> > Attachment can be found on
>> > https://lists.apache.org/[email protected] list.
>> > Not sure why it didn't show up on dev - I sent msg to both lists.
>> >
>> > On Tue, Mar 15, 2022 at 3:41 PM Mike Beckerle <[email protected]>
>> > wrote:
>> >
>> > > No attached PDF.
>> > >
>> > > Removal of blanks from a number would normally be by way of DFDL
>> > "padding"
>> > > and "trimming" properties.
>> > >
>> > > You need dfdl:textNumberJustification property, also textTrimKind,
>> > > textPadKind, textNumberPadCharacter.
>> > >
>> > > You can also use textNumberPattern to indicate that a number may have
>> > > leading spaces or zeros, but this is mostly about output when those
>> > leading
>> > > zeros are required.
>> > >
>> > > Padding on left == Right Justified
>> > > Padding on right == Left Justified
>> > > Padding around both sides = Center Justified.
>> > >
>> > >
>> > >
>> > > On Tue, Mar 15, 2022 at 8:53 AM Attila Horvath <
>> > [email protected]
>> > > >
>> > > wrote:
>> > >
>> > > > Ping... any assistance appreciated - thx
>> > > >
>> > > > ---------- Forwarded message ---------
>> > > > From: Attila Horvath <[email protected]>
>> > > > Date: Mon, Mar 14, 2022 at 12:38 PM
>> > > > Subject: string to integer conversion w/ leading blanks fails
>> > > > To: <[email protected]>, <[email protected]>
>> > > >
>> > > >
>> > > > ALCON
>> > > >
>> > > > Can someone pls suggest a way to convert string to integer if/when
>> > > leading
>> > > > blanks are present?
>> > > >
>> > > > My attempts are failing - see attached pdf for more details.
>> > > >
>> > > > Thx in advance
>> > > >
>> > > > Attila
>> > > >
>> > >
>> >
>>
>

Reply via email to