2018-05-23 22:44:46 +0100, Stephane CHAZELAS: [...] > [a-z] is not guaranteed to match on lower case letters only let > alone abcdefghijklmnopqrstuvwxyz only, it may even match on > characters outside the latin script. [...]
Actually, I suspect that POSIX requires ranges in the POSIX locale to be based on collation (and unspecified in other locale) so that [a-z] be guaranteed to match on abcdefghijklmnopqrstuvwxyz only even when the POSIX locale's charset is something like EBCDIC where those characters are not contiguous. It's ironic that doing that for other locale would break the expectation that [a-z] should match on abcdefghijklmnopqrstuvwxyz while the locale's charset has them in the correct order. Is that a POSIX invention (the [a-z] based on collation) by the way, or does it come from implementations that already existed at the time? What about the [.elt.], [=equiv=], [:class:]? Is it a POSIX invention of specification of prior art? I've come across a past discussion on the GNU grep mailing list suggesting the based-on-collation ranges should only be done when using [[.a.]-[.z.]], while [a-z] should be based on code point. That sounds to me like a nice idea. -- Stephane