Re: character ranges in regular expressions

Aharon Robbins Mon, 04 Oct 2010 13:44:26 -0700

Sorry for chiming in on this rather late...

> Date: Fri, 24 Sep 2010 16:27:53 -0600
> From: Eric Blake <ebl...@redhat.com>
> To: Bruno Haible <br...@clisp.org>
> Cc: Paolo Bonzini <bonz...@gnu.org>, Paul Eggert <egg...@cs.ucla.edu>,
>         bug-grep@gnu.org, Jim Meyering <j...@meyering.net>
> Subject: Re: character ranges in regular expressions
>
> On 09/24/2010 03:52 PM, Bruno Haible wrote:
> >
> > 1) Is there an agreement of what the result should be? Jim seems to prefer 
> > to
> > extrapolate the result of the "C" locale, i.e. 26.
>
> As do I.
>
> > For other people, the locale
> > dependent behaviour is useful, that is, 51 is desired.
>
> Which is why my proposal is that glibc consider:
>
> [A-Z] => match C locale; 26 letters, regardless of locale
> [[.A.]-[.Z.]] => use collation rules, since we explicitly spelled things 
> with collation symbols (26 letters in POSIX local, 51 or even more in 
> other locales, since accented characters might be included in the 
> collation range), so that we aren't completely losing CEO behavior (if 
> someone seriously has a reason to use it)
> [[:upper:]] => per POSIX rules in all locales


This would be great.  In what must be close to (or more than) the
10 years since gawk started supporting locales, I have yet to meet
anyone who thinks that [a-z] matching [A-Y] is a feature!

Thanks,

Arnold

Re: character ranges in regular expressions

Reply via email to