On 15/04/2022 04:57, Christoph Anton Mitterer wrote:
On Fri, 2022-04-15 at 00:44 +0100, Harald van Dijk via austin-group-l
at The Open Group wrote:
Hmm, I would.

I like that :-D This would have been the preferred alternative I've
asked for to look at, in the ticket.



Shells
are not in agreement on whether such single bytes can be matched with
[...], nor in those shells where they can be, whether multiple
bracket
expressions can be used to match the individual bytes of a valid
multi-byte character.

The cases with [...] only come up when scripts themselves use
patterns
that are not valid character strings

You mean in the lexical locale?

I do not, but interesting question. I am one of the few, if not only, shell authors that actually implemented "Changing the value of LC_CTYPE after the shell has started shall not affect the lexical processing of shell commands in the current shell execution environment or its subshells" rule. Even I did not to apply this to pattern matching. The lexical locale, the locale used for lexing, is only used for lexing, i.e. for recognising tokens, not to how those tokens are then interpreted later on. If locale comes into play for that, as it does in pattern matching, it is the then-current value of LC_CTYPE that comes into play, as it does in other shells.

they are unlikely to affect
existing scripts and I imagine there is not much harm in leaving
those
unspecified.

It should however be clearly described that behaviour in this field is
undefined, perhaps with some "future directions" that this might change
some day.

I prefer explicit over implicit as well myself. Perhaps it does not even need to be undefined though, perhaps unspecified with a few limited options is good enough. I am not sure at this time whether that is feasible.

As for future directions, no opinion on that from me.

The cases with * and ? do come up in existing scripts, but
if shells are in agreement as they appear to be, there is no need to
coordinate with shell authors on whether they would be willing to
change
their implementations, it is possible to change POSIX to describe the
shells' current behaviour.

Well but it's not only * and ? ... it's also a single character
matching that character in a byte string that contains bytes or
sequences thereof which do not form any valid character ... both before
or after that character to be matched.

Yes, I did mention those earlier on in my message but forgot to repeat it here. It's where shells also appear to be in agreement, except in the same corner case that also applies to [...] where an invalid byte in a pattern is used to match part of a valid character in the string.

And since pattern matching notation isn't just used for matching alone,
but e.g. also for string manipulation in parameter expansion (e.g.
"${foo%.}" case)... these shells would also need to agree how to handle
that, wouldn't they?

I would not think this should be a special case: «${foo%.}» should strip a trailing «.» in exactly those cases where the shell considers foo to match the pattern «*.». However, I can see value in doing some extra tests to verify that this matches what shells do.

If there is interest in getting this standardised, I can spend some
more
time on creating some hopefully comprehensive tests for this to
confirm
in what cases shells agree and disagree, and use that as a basis for
proposing wording to cover it.

I'd love to see that and if you'd actually do so,.... I'd kindly ask
Geoff to defer any changes in the ticket #1564 of mine, until it can be
said whether it might be possible to get that standardised.

Very well, I will post tests and test results as soon I can make the time for it.

Cheers,
Harald van Dijk

  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [Is... Robert Elz via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
      • Re:... Robert Elz via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
          • ... Christoph Anton Mitterer via austin-group-l at The Open Group
            • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Christoph Anton Mitterer via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
              • ... Chet Ramey via austin-group-l at The Open Group
              • ... Harald van Dijk via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
            • ... Harald van Dijk via austin-group-l at The Open Group

Reply via email to