Harald van Dijk <a...@gigawatt.nl> wrote, on 25 Sep 2019: > > After comparing what my shell does now during pattern matching to what it > should, I found a few more cases where I do not believe POSIX is clear about > what is required and where shells are not in agreement. These are not > related to the backslash handling.
This isn't a complete response to all the points - I'm just noting some things that I don't think other responders have mentioned. > 1a. Invalid character classes: > > case x in [x[:bogus:]]) echo x ;; esac # bash,bosh,mksh,nbsh,osh,zsh > case x in [![:bogus:]]) echo x ;; esac # above except osh > > The handling of this in dash, inherited by my shell, is just buggy and > should be ignored. > > In bash, bosh, mksh, nbsh, zsh, a character does not match an invalid > character class. In osh, a character neither matches nor fails to match an > invalid character class, but the pattern is still valid. In yash, the use of > [:bogus:] renders the whole pattern invalid. > > These all seem reasonable choices. regcomp() would reject the whole pattern > as an error, and character classes are supposed to behave as they do in > regular expressions, so I believe yash's behaviour makes the most sense. Is > that correct? The key here is the way 2.13.1 words the description of '[': If an open bracket introduces a bracket expression as in XBD Section 9.3.5, except [...]. Otherwise, '[' shall match the character itself. (This wording is being improved via bug 985 but that change does not affect how it applies here.) If "bogus" is not a valid character class for the current locale, then the "If" is not satisfied and [x[:bogus:]] is treated as a literal [, a literal x, the bracket expression [:bogus:] and a literal ]. XBD 9.3.5 item 8 says it is unspecified whether [:bogus:] is treated as a character class, treated as a matching list expression, or rejected as an error. If it is not treated as a matching list, then the "If" in 2.13.1 is again not satisfied and [:bogus:] is treated as a sequence of literal characters. > 1b. Quoted character classes: > > Shells agree that quoting disables the recognition of character classes, but > they disagree on how much quoting disables it. > > case x in ["[:alnum:]"]) echo x ;; esac # none > case x in [[:"alnum:]"]) echo x ;; esac # none > case x in [[:"alnum:"]]) echo x ;; esac # ksh, mksh, yash, zsh > case x in [[:\alnum:]]) echo x ;; esac # above plus osh > case x in [[:"alnum":]]) echo x ;; esac # above plus dash, nbsh > > I believe that as the special characters to indicate a character class are > "[:" and ":]", the osh behaviour is correct, the character class name is > allowed to be quoted. Is that correct? The dash/nbsh behaviour, again > inherited by my shell, is close, but the fact that the type of quoting > affects how the character class is treated looks like a bug. Some shells are known not to handle shell quoting correctly in bracket expressions (in general, not specific to character classes). I think this came to light during discussion of bug 1190. I seem to recall ksh93 being the main culprit, but other shells may have had bugs as well. > 2. Collating symbols and equivalence classes > > Collating symbols and equivalence classes are less widely implemented. > > case x in [[.x.]]) echo x ;; esac # bash, ksh, mksh, osh, yash > case x in [[=x=]]) echo x ;; esac # same > case ä in [[=a=]]) echo x ;; esac # bash, ksh, yash > case a in [[=ä=]]) echo x ;; esac # same > > The handling of brackets in pattern matching is defined by reference to RE > Bracket Expression and no exception has been made for them, so these are > supposed to be handled in pattern matching as well. > > 2a. Multi-character collating symbols and equivalence classes > > Multi-character support seems impossible to implement portably other than by > translating patterns to regular expressions as yash does. POSIX does not > provide any other means to ask the implementation enough information about > what is supported in the current locale. And when things to get translated > to regular expressions, it relies on libc support, with glibc behaving > strangely, but this may just be my limited understanding of how things are > supposed to work. > > LANG=cy_GB.UTF-8 > case ch in [[=ch=]]) echo x ;; esac # none > case ch in [[.ch.]]) echo x ;; esac # yash > case xch in x[[=ch=]]) echo x ;; esac # yash > > Are shells required to support this, and are shells therefore implicitly > required to translate patterns to regular expressions, or should it be okay > to implement this with single character support only? Shells are required to support it. They don't need to translate entire patterns to regular expressions - they can use either regcomp()+regexec() or fnmatch() to see if the bracket expression matches the next character. > 2b. Invalid collating elements > > As with invalid character classes: > > case x in [x[.xy.]]) echo x ;; esac # bash, ksh, mksh > > This would be rejected with an error by regcomp(), so rejecting the whole > pattern makes most sense to me. This appears to be what osh is doing as > well, in a change from how it handles invalid character classes, and as > expected it is what yash does. Is it the right approach? Same answer as for invalid character classes. > 2c. Quoted equivalence classes and collating symbols > > The same question of quoting applies to these, but here too osh no longer > behaves the way it did with character classes: > > case x in [[="x="]]) echo x ;; esac # ksh, mksh, osh, yash > case x in [[."x."]]) echo x ;; esac # same > > I believe this is incorrect for the same reason as the quoting in character > classes. The quoting of "x" should be okay, but the quoting of "=" or "." > should disable the recognition as an equivalence class or collating symbol, > so the meaning of the pattern [[="x="]] should change to "one of [=x=, > followed by ]", just like how the pattern [["=x="]] is treated already. Does > that sound right? Again this could be just one aspect of the general bugginess of some shells regarding shell quoting in bracket expressions in general. -- Geoff Clare <g.cl...@opengroup.org> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England