On 09/23/2010 06:19 PM, Paul Eggert wrote:
On 09/23/10 04:52, Paolo Bonzini wrote:
Or better, we're at glibc's mercy:
$ LC_ALL=cs_CZ.UTF-8 devel/grep/+build/src/grep -E '[A-Z]' in
00a
00g
00A
00G
00Z
Yay for yet another definition of range expressions.
Can we fix things so that we're not at glibc's mercy, even there?
We could preprocess the regular expression [A-Z], and turn it into
[ABCDEFGHIJKLMNOPQRSTUVWXYZ], before we hand it off to glibc.
POSIX would allow this behavior, and users would prefer it.
This could be done in a gnulib module, so that other GNU programs
could also use the fix.
That calls for a huge confusion between tools. It's better sorted out
in glibc.
I don't see a reason why glibc should refuse the proposal of
differentiating [A-Z] (code point range) from [[.A.]-[.Z.]] (real
strcoll comparison, however! Not the current, absurd behavior that
everybody hates). If somebody writes the patch, that is.
A small disadvantage is that collation order would not be available
anymore in fnmatch, since it is better to keep regex consistent with
fnmatch.
Since we are at it, Bruno, you worked a lot on the localization features
of glibc. Can you shed light on what __collseq_table_lookup is supposed
to mean?
Paolo