On Fri, 2022-01-14 at 09:07 +0300, Oğuz wrote: > > And where does it say that? I mean in the standard. > > I.e. where does it say, that parsing is only allowed to happen in > > one > > stage from left to right, especially not only with respect to an RE > > itself, but also when an RE is embedded in a command with > > delimiters. > > It is what makes sense.
Other than that it's more efficient as it requires only one pass, I wouldn't see why it should make more or less sense. And as mentioned before, the wording of the standard IMO rather implies the other behaviour. POSIX always says something like "backslash followed by delimiter is »literal«" (whatever literal is ought to mean). It does not say: "AN BY ITSELF NON-ESCAPED backslash followed by delimiter is »literal«" > > Where does it say, whether: > > s.[.].X. > > is: > > a) s/[/]/ > > followed by X. > > > > or: > > > > b) s/[.]/X/ > > It is clearly `a'. The second period is not preceded by a backslash. But that again just shows how ambiguous the standard is: - if you prefer the left-to-right all-at-once parsing, then it's ambiguous, because the standard leaves open which rules wins: the ones for bracket expression, or the a-non-delimiter-would-need-to-be- preceded-by-\ - if one prefers the two stages, where one looks first for unquoted delimiter characters, which you say however "wouldn't be the one that make sense", then it would be clearly (b). And since one major implementation (GNU sed) would already make it wrong, if it were (b),... it's IMO proof enough, that things aren't as clear. > > Then, with a delimiter that is also a special character, the > > special > > character would no longer be usable as such. > > Again, that's why any character other than backslash and newline can > be used as the delimiter. You mean, that one could simply use another delimiter? But I mean that's not the point here: Of course one can. And I'm fine if one says: "it's simply not possible" but this should than IMO also be clearly pointed out. > > btw: busybox sed, behaves as you say: > > $ printf '%s\n' '.' | busybox sed 's.\..X.' > > X > > $ printf '%s\n' 'a' | busybox sed 's.\..X.' > > a > > So do Solaris sed and Unixware sed. On the other hand, all seds I > tested do this > > $ echo a | sed 's1(.)1\11' > a > > except GNU sed; it prints 1 instead. Uhm? What you write above is a completely different test?! Not using bracket expressions. But it's another nice example, that the ambiguity also affects back-references However: (GNU sed) 4.8 shows the following here: $ echo a | sed 's1(.)1\11' a $ echo a | busybox sed 's1(.)1\11' a > The behavior varies widely with > other special characters and sequences, so the best standard > developers could do is to state that the results are unspecified when > they are used as delimiters; OR, leave the text as is, which already > implies that it is unspecified. Which the standard however doesn't do as of now. I'd also be fine with it, if the standard wouldn't mandate "one correct behaviour" when "special" characters are used as delimiter,... but it should then clearly say that for such characters behaviour is undefined (and indeed varies with major implementations) and also name which are all dangerous: - at least all special characters (with respect to BRE/ERE) - digits 1-9 - & Cheers, Chris.