You've already answered it, thank you. I didn't know that [:, [., [= were special *sequences*, I guess I overlooked that part. Thanks again for taking time to explain it in detail, I'm grateful
9 Kasım 2019 Cumartesi tarihinde Robert Elz <k...@munnari.oz.au> yazdı: > Date: Sat, 9 Nov 2019 07:35:16 +0300 > From: =?UTF-8?B?T8SfdXo=?= <oguzismailuy...@gmail.com> > Message-ID: < > cah7i3lr68civxlr9_hoogqa7vd-zyvz+fck-0k3uqptnsir...@mail.gmail.com> > > | is correct, as "foo" does not contain a ']' which would be required > | > to match there (quoting the ':' means there is no character class, > | > hence we have instead (the negation of) a char class containing '[' > ':' > | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and > | > followed by ']' and anything. foo does not match. f]oo would. > | > > | > | where exactly is this documented in the standard? > > I'm not sure which part exactly you're looking for, but char sets in sh > are specified to be the same as in REs, except that ! replaces ^ as the > negation character (that's in XCU 2.13.1). Char sets (bracket expressions) > in RE's are documented in XBD 9.3.5 wherein it states > > A bracket expression is either a matching list expression or a > non-matching list expression. It consists of one or more > expressions: > ordinary characters, collating elements, collating symbols, > equivalence classes, character classes, or range expressions. > The <right-square-bracket> (']') shall lose its special meaning and > represent itself in a bracket expression if it occurs first in the > list > (after an initial <circumflex> ('^'), if any). > > Otherwise, it shall terminate the bracket expression, > > That is, a ']' that occurs anywhere else terminates the bracket expression > except: > > unless it appears in a collating symbol (such as "[.].]") > > (not relevant in the given example) > > or is the ending <right-square-bracket> for a collating symbol, > equivalence class, or character class. > > So the ']' that immediately follows the second ':' would not terminate the > bracket expression if it is the ending ']' for a character class > (collating symbols and equiv classes not being relevant to the example). > Of course, that can only happen if there is a character class to end. > > There's also > > The special characters '.', '*', '[', and '\\' > (<period>, <asterisk>, <left-square-bracket>, and <backslash>, > respectively) shall lose their special meaning within a bracket > expression. > > whereupon if the [": sequence does not start a char class, the '[' there > is simply a literal char inside the bracket expression. > > Similarly if the bracket expression ends at the first ']' (the one > imediately > after the second ':') the following ']' is simply a literal character, as > ']' chars are special only when following a '['. > > So, all that's left to determine is whether the [": sequence can be > considered as beginning a char class. > > In a RE it certainly cannot - quote chars (' and ") are not special in > REs at all, and [": is no different syntatically than [x: which no-one > would treat as being the introduction to a char class. > > This is also, I believe (Chet can confirm, or refute, if he desires) where > bash gets the interpretation that "lower" (including the quotes) is the > name of the char class in [:"lower":] except that it cannot be, as char > class names cannot contain quote characters (which should lead to the > whole sub-expression not being treated as a char class at all, instead > bash treats it, I think, as if it were an unknown but valid class name). > > But when it comes from sh, quote chars are "different" and instead of > just being characters, they instead affect the interpretation of the > characters that are quoted. See XCU 2.2: > > Quoting is used to remove the special meaning of certain characters > or words to the shell. > > Quoting can be used to preserve the literal meaning of the special > characters in the next paragrapyh [...] > > and the following may need to be quoted under certain > circumstances. > That is, these characters may be special depending on conditions > described elsewhere in this volume of POSIX.1-2017: > > * ? [ # ~ = % > > to which more chars have been added (as I recall) recently by some > Austin Group correction (which I think includes ! : - and ]), that is > to make it clear, that in sh > > [a'-'z] > > is a bracket expression containing 3 chars 'a' '-' and 'z' (which form > of quoting is used to remove the specialness of the '-' is irrelevant). > and that "[a-z]" isn't a bracket expression at all (neither of which > is true in an RE - though the role of \ in RE's is being altered slightlty > so if it had been [a\-z] in a RE things are less clear.) > > The effect of this is that in sh, in an expression like > > [![":lower":]] > > the first ':' is not "special" and hence cannot form part of the > magic opening '[:' sequence for a character class. Hence this > expression contains no character class, and consequently the > ':]' chars are simply a ':' in the bracket expression, and then > the terminating ']' - which leaves the second ']' being just a > literal character. > > > While here (these following parts are not relevant to your question I > believe) > when used in sh > > [[:"lower":]] > > should be treated just the same as > > [[:lower:]] > > for the same reason that > > ["abc"] > > is treated the same as > > [abc] > > That is, quoted characters that are not special are no different > than the same character unquoted. That's universal in sh, quoting > removes special meaning (of lots of things) but where there was none > the quoting changes nothing at all, eg: > > "ls" \-'l' > > is exactly the same as > > ls -l > > and > x="foo" y='' > is identical to > x=foo y= > (though not all empty quoted strings are irrelevant that way). > > There are other issues that are less clear what should happen, if your > example had been > > [![:"lower:"]] > > then we get into very murky water indeed. XBD 9.3.5 says: > > The character sequences "[.", "[=", and "[:" (<left-square-bracket> > followed by a <period>, <equals-sign>, or <colon>) shall be special > inside a bracket expression > > [aside: not related to my current point, the "shall be special" is what > enables sh quoting to stop that from happening, since quoting in the shell > prevents specialness from happening] > > and are used to delimit collating symbols, equivalence class > expressions, and character class expressions. > > That part (so far) is clear and non-controversial. > > These symbols shall be followed by a valid expression and the > matching terminating sequence ".]", "=]", or ":]", as described > in the following items. > > That's the part that is less clear. When a valid expression and the > terminating sequence appear, there is no issue, and all is fine - what > is less clear is what happens when one of those reqirements is not met. > > Some read this as purely a reqirement on the application - what the > script writer is required to do - and when they don't the implementation > (sh or RE library, or whatever) is free to interpret things (which means > the whole pattern) however it likes (often as not being a pattern at all). > > Personally I disagree - I believe it is a requirement on the application > if it desires the relevant sequence to be interpreted as a char class (etc) > and if the application does not include a valid expression or terminating > sequence the implementation should be required to treat the opening > char sequence as if it did not begin a char class (etc) and the [: were > simply 2 chars contained in the bracket expression (they must be in > a bracket expression or the issue doesn't arise at all). > > Unfortunately (for the world in general, in that more and more of this > is becoming unspecified, which makes it harder and harder to know what > any particular sequence of characters will do) it seems like the former > interpretation is the more likely to be adopted. > > If I have not understoood the "this" in your > > where exactly is this documented > > please be more precise, and I will try to answer. > > kre > >