Christoph Anton Mitterer wrote, on 25 Apr 2022:
>
> On Fri, 2022-04-22 at 12:03 +0100, Geoff Clare via austin-group-l at
> The Open Group wrote:
> > 
> > > 3) You introduce bytes/byte sequences vs. characters.
> > > 
> > > I don't understand why you need that at all?
> > 
> > The current wording in terms of characters implies that the word
> > being
> > subjected to field splitting can be treated as a character string.
> > I wanted to ensure that there is no possible way to infer that as
> > being allowed by the new text.
> 
> Okay. Fair enough.
> 
> Do you think that similar explanations would be needed for the other
> expansions/substitutions?

I searched all of XCU chapter 2 for the word "character" when I worked
on my proposed changes for bugs 1560, 1561, 1562 and 1564, and I believe
the proposed changes cover all of the places that need changing.

> > > => But there is one thing that's IMO lost on the way:
> > > The old:
> > > "    any sequence of <space>, <tab>, or <newline> characters at the
> > > beginning or end of the input shall be ignored and any sequence of
> > > those
> > > characters within the input shall delimit a field"
> > > 
> > > "sequence of those characters" indicated that a sequence of 1-n IFS
> > > characters were still regarded as one single field splitter.
> > > 
> > > With the new:
> > > "ignored and any sequence of such bytes"
> > > that's IMO a bit lost... sequence of bytes is rather considered
> > > like ONE
> > > "multi-byte" character.
> > 
> > Each of the bytes in question encodes a single-byte character, so
> > it's
> > impossible for them to combine to form one multi-byte character.
> 
> Ah... ok... I had arbitrary (multi-byte) characters for IFS in mind,
> but point (1) on page 2325 is only about the specific characters
> <space>, <tab> and <newline>.
> 
> But still, this (i.e. that these characters are per POSIX always one
> byte) is quite "special knowledge", so why not - similar to your change
> below - add a "(zero or more instances)" for the 1st case?

Okay.  I'll edit the note to add this.

> > > You don't have that problem with the 4th change, where you
> > > explicitly say:
> > > "any sequence (zero or more instances) of the byte sequences that
> > > comprise
> > > white-space characters"
> > 
> > I'll insert "(one or more instances)".
> 
> Your addition of "(one or more instances)" for the 2nd case is IMO
> quite good, as with "any" it's not always clear if it would also
> include the "empty element" (as it does in principle for the 1st
> case)... and here it clearly means "at least one".
> 
> 
> Btw: Doesn't it need to say "any sequence of <space>, <tab>, AND
> <newline>" instead of "... or ..."?
> 
> In my understanding... the "any of" gives one the arbitrary choice of
> elements/order of the "set" described afterwards.
> If that "set" is "<space>, <tab>, OR <newline>" than I'd rather read it
> as "either of them"... so it would be "any seq. of space OR any seq. of
> tab OR any seq. of newline"?

I see your point, but it seems to me you have to overthink it before
there is any potential for misinterpretation.  Also, I think using
"and" instead of "or" would be more likely to be misinterpreted.  It
could be read as implying that the sequence has to have at least one
<space>, at least one <tab>, and at least one <newline>.

The current wording with "or" has been in POSIX since the first shell
and utilities standard (1992) so I'm inclined to stay with that.

In any case, the example prevents misinterpretation.

> On Fri, 2022-04-22 at 12:13 +0100, Geoff Clare via austin-group-l at
> The Open Group wrote:
> > Geoff Clare wrote, on 22 Apr 2022:
> > > 
> > > I'll insert "(one or more instances)".
> > 
> > And having done so, the use of "zero or more instances" in the
> > paragraph about IFS white space now seems wrong to me.  I think it
> > should be one or more as well.
> 
[...]
> 
> For (3c): "Non-zero-length IFS white space shall delimit a field."
> 
> 0-n would IMO be ok, as it specifically says "non-zero-length" ...
> which might be obsolete with your change.
> 
> 
> So I think, that it needs to stay "zero or more"... or "any" needs to
> be clarified... or the text for (3b) changed somehow.

I should have looked at the whole text - I was thinking about the
definition of IFS white space in isolation, and having an empty string
be IFS white space seemed odd to me, but it is clear from this use of
"Non-zero-length IFS white space shall ..." that this was intentional
for whatever reason, so it would be best not to mess with it.

I'll edit the note to change it back to "zero".

-- 
Geoff Clare <g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Geoff Clare via austin-group-l at The Open Group
        • ... Christoph Anton Mitterer via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
            • ... Christoph Anton Mitterer via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to