Harald van Dijk <a...@gigawatt.nl> wrote, on 25 Sep 2019:
>
> After comparing what my shell does now during pattern matching to what it
> should, I found a few more cases where I do not believe POSIX is clear about
> what is required and where shells are not in agreement. These are not
> related to the backslash handling.

This isn't a complete response to all the points - I'm just noting
some things that I don't think other responders have mentioned.

> 1a. Invalid character classes:
> 
>   case x in [x[:bogus:]]) echo x ;; esac # bash,bosh,mksh,nbsh,osh,zsh
>   case x in [![:bogus:]]) echo x ;; esac # above except osh
> 
> The handling of this in dash, inherited by my shell, is just buggy and
> should be ignored.
> 
> In bash, bosh, mksh, nbsh, zsh, a character does not match an invalid
> character class. In osh, a character neither matches nor fails to match an
> invalid character class, but the pattern is still valid. In yash, the use of
> [:bogus:] renders the whole pattern invalid.
> 
> These all seem reasonable choices. regcomp() would reject the whole pattern
> as an error, and character classes are supposed to behave as they do in
> regular expressions, so I believe yash's behaviour makes the most sense. Is
> that correct?

The key here is the way 2.13.1 words the description of '[':

    If an open bracket introduces a bracket expression as in XBD
    Section 9.3.5, except [...]. Otherwise, '[' shall match the
    character itself.

(This wording is being improved via bug 985 but that change does not
affect how it applies here.)

If "bogus" is not a valid character class for the current locale,
then the "If" is not satisfied and [x[:bogus:]] is treated as a
literal [, a literal x, the bracket expression [:bogus:] and a
literal ].

XBD 9.3.5 item 8 says it is unspecified whether [:bogus:] is treated as
a character class, treated as a matching list expression, or rejected
as an error.  If it is not treated as a matching list, then the "If" in
2.13.1 is again not satisfied and [:bogus:] is treated as a sequence
of literal characters.

> 1b. Quoted character classes:
> 
> Shells agree that quoting disables the recognition of character classes, but
> they disagree on how much quoting disables it.
> 
>   case x in ["[:alnum:]"]) echo x ;; esac # none
>   case x in [[:"alnum:]"]) echo x ;; esac # none
>   case x in [[:"alnum:"]]) echo x ;; esac # ksh, mksh, yash, zsh
>   case x in [[:\alnum:]])  echo x ;; esac # above plus osh
>   case x in [[:"alnum":]]) echo x ;; esac # above plus dash, nbsh
> 
> I believe that as the special characters to indicate a character class are
> "[:" and ":]", the osh behaviour is correct, the character class name is
> allowed to be quoted. Is that correct? The dash/nbsh behaviour, again
> inherited by my shell, is close, but the fact that the type of quoting
> affects how the character class is treated looks like a bug.

Some shells are known not to handle shell quoting correctly in bracket
expressions (in general, not specific to character classes).  I think
this came to light during discussion of bug 1190.  I seem to recall ksh93
being the main culprit, but other shells may have had bugs as well.

> 2. Collating symbols and equivalence classes
> 
> Collating symbols and equivalence classes are less widely implemented.
> 
>   case x in [[.x.]]) echo x ;; esac # bash, ksh, mksh, osh, yash
>   case x in [[=x=]]) echo x ;; esac # same
>   case ä in [[=a=]]) echo x ;; esac # bash, ksh, yash
>   case a in [[=ä=]]) echo x ;; esac # same
> 
> The handling of brackets in pattern matching is defined by reference to RE
> Bracket Expression and no exception has been made for them, so these are
> supposed to be handled in pattern matching as well.
> 
> 2a. Multi-character collating symbols and equivalence classes
> 
> Multi-character support seems impossible to implement portably other than by
> translating patterns to regular expressions as yash does. POSIX does not
> provide any other means to ask the implementation enough information about
> what is supported in the current locale. And when things to get translated
> to regular expressions, it relies on libc support, with glibc behaving
> strangely, but this may just be my limited understanding of how things are
> supposed to work.
> 
>   LANG=cy_GB.UTF-8
>   case  ch in  [[=ch=]]) echo x ;; esac # none
>   case  ch in  [[.ch.]]) echo x ;; esac # yash
>   case xch in x[[=ch=]]) echo x ;; esac # yash
> 
> Are shells required to support this, and are shells therefore implicitly
> required to translate patterns to regular expressions, or should it be okay
> to implement this with single character support only?

Shells are required to support it.  They don't need to translate
entire patterns to regular expressions - they can use either
regcomp()+regexec() or fnmatch() to see if the bracket expression
matches the next character.

> 2b. Invalid collating elements
> 
> As with invalid character classes:
> 
>   case x in [x[.xy.]]) echo x ;; esac # bash, ksh, mksh
> 
> This would be rejected with an error by regcomp(), so rejecting the whole
> pattern makes most sense to me. This appears to be what osh is doing as
> well, in a change from how it handles invalid character classes, and as
> expected it is what yash does. Is it the right approach?

Same answer as for invalid character classes.

> 2c. Quoted equivalence classes and collating symbols
> 
> The same question of quoting applies to these, but here too osh no longer
> behaves the way it did with character classes:
> 
>   case x in [[="x="]]) echo x ;; esac # ksh, mksh, osh, yash
>   case x in [[."x."]]) echo x ;; esac # same
> 
> I believe this is incorrect for the same reason as the quoting in character
> classes. The quoting of "x" should be okay, but the quoting of "=" or "."
> should disable the recognition as an equivalence class or collating symbol,
> so the meaning of the pattern [[="x="]] should change to "one of [=x=,
> followed by ]", just like how the pattern [["=x="]] is treated already. Does
> that sound right?

Again this could be just one aspect of the general bugginess of some shells
regarding shell quoting in bracket expressions in general.

-- 
Geoff Clare <g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

Reply via email to