Robert Elz <k...@munnari.oz.au> wrote, on 26 Sep 2019: > > So, if bogus is not a valid char class for the locale (and if that is > treated as meaning the [:...:] is not a character class element of the > bracket expression, then the bracket expression is > [x[:bogus:] > where all chars between the initial '[' and the terminating ']' are > simply literal chars. So this will batch one char that is any of > : [ b g o s u x > and the pattern will batch a word that starts with one of those 7 chars > and is followed by a ']' char.
Good point. I think that this, and the behaviour I described, are both allowed by the standard. > | XBD 9.3.5 item 8 says it is unspecified whether [:bogus:] is treated as > | a character class, treated as a matching list expression, or rejected > | as an error. > > Yes, that is unfortunate, it should be specified than an unknown (but > syntactically valid) class name in a character class is simply to be > treated as a class containing no characters, Item 8 isn't about what's between the ':'s in [[:...:]], it's about an RE that contains [:...:] without the outer pair of square brackets. I.e. it is unspecified whether [:alpha:] is treated as [[:alpha:]], treated as [:alph], or rejected. > | > 1b. Quoted character classes: > > | Some shells are known not to handle shell quoting correctly in bracket > | expressions (in general, not specific to character classes). > > This issue is specific to character classes (and is subtly different > than equivalence classes and collating symbols, as the syntax of the > name is defined, so we know quoting is never actually required for it, > unlike the others ... though I don't really believe that should make > a difference. > > The question is whether [:"alpha":] is the same as [:alpha:] or not. My point was that ksh93 treats [a"-"b] the same as [a-b] so trying to test something more specific to do with character classes in ksh93 is not going to yield any useful information. > | > 2a. Multi-character collating symbols and equivalence classes > | > > > | > LANG=cy_GB.UTF-8 > | > case ch in [[=ch=]]) echo x ;; esac # none > | > case ch in [[.ch.]]) echo x ;; esac # yash > | > case xch in x[[=ch=]]) echo x ;; esac # yash > > | Shells are required to support it. They don't need to translate > | entire patterns to regular expressions - they can use either > | regcomp()+regexec() or fnmatch() to see if the bracket expression > | matches the next character. (I later corrected this to "matches at the next character") > > The question here relates to "next character" - in the "case ch" where > the word being matched is "ch" is that one character, or two? A bracket > expression mateches just one, but an equivalence class may, as I understand > it, include dipthongs (so u-umlaut and ue might be treated the same, where > the former is one character, and the latter is two). > > Harald's question is whether shells are required to attempt to match > such things, rather than just "matches the next character" ? My previous reply was based on XBD 9.3.5 item 4, but I have just spotted that the intro paragraph of 9.3.5 uses the word "may": A bracket expression ... is an RE that shall match a specific set of single characters, and may match a specific set of multi-character collating elements, ... So it appears that it is optional whether matching a bracket expression against more than one character is supported. -- Geoff Clare <g.cl...@opengroup.org> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England