Sorry for chiming in on this rather late... > Date: Fri, 24 Sep 2010 16:27:53 -0600 > From: Eric Blake <ebl...@redhat.com> > To: Bruno Haible <br...@clisp.org> > Cc: Paolo Bonzini <bonz...@gnu.org>, Paul Eggert <egg...@cs.ucla.edu>, > bug-grep@gnu.org, Jim Meyering <j...@meyering.net> > Subject: Re: character ranges in regular expressions > > On 09/24/2010 03:52 PM, Bruno Haible wrote: > > > > 1) Is there an agreement of what the result should be? Jim seems to prefer > > to > > extrapolate the result of the "C" locale, i.e. 26. > > As do I. > > > For other people, the locale > > dependent behaviour is useful, that is, 51 is desired. > > Which is why my proposal is that glibc consider: > > [A-Z] => match C locale; 26 letters, regardless of locale > [[.A.]-[.Z.]] => use collation rules, since we explicitly spelled things > with collation symbols (26 letters in POSIX local, 51 or even more in > other locales, since accented characters might be included in the > collation range), so that we aren't completely losing CEO behavior (if > someone seriously has a reason to use it) > [[:upper:]] => per POSIX rules in all locales
This would be great. In what must be close to (or more than) the 10 years since gawk started supporting locales, I have yet to meet anyone who thinks that [a-z] matching [A-Y] is a feature! Thanks, Arnold