2018-05-23 22:44:46 +0100, Stephane CHAZELAS:
[...]
> [a-z] is not guaranteed to match on lower case letters only let
> alone abcdefghijklmnopqrstuvwxyz only, it may even match on
> characters outside the latin script.
[...]

Actually, I suspect that POSIX requires ranges in the POSIX
locale to be based on collation (and unspecified in other
locale) so that [a-z] be guaranteed to match on
abcdefghijklmnopqrstuvwxyz only even when the POSIX locale's
charset is something like EBCDIC where those characters are not
contiguous.

It's ironic that doing that for other locale would break the
expectation that [a-z] should match on
abcdefghijklmnopqrstuvwxyz while the locale's charset has
them in the correct order.

Is that a POSIX invention (the [a-z] based on collation) by the
way, or does it come from implementations that already existed
at the time?

What about the [.elt.], [=equiv=], [:class:]? Is it a POSIX
invention of specification of prior art?

I've come across a past discussion on the GNU grep mailing list
suggesting the based-on-collation ranges should only be done
when using [[.a.]-[.z.]], while [a-z] should be based on code
point.

That sounds to me like a nice idea.

-- 
Stephane

Reply via email to