Re: [9fans] simplicity

erik quanstrom Tue, 09 Oct 2007 21:02:50 -0700

> Yes, old thread, sorry.  Blame Uriel.
> 
> On 9/18/07, Douglas A. Gwyn <[EMAIL PROTECTED]> wrote:
> > erik quanstrom wrote:
> > > suppose Linux user a and user b grep the same "text" file for the same 
> > > string.
> > > results will depend on the users' locales.
> >
> > But if they're trying to match an alphabetic character class, the
> > result *should* depend on the locale.
> 
> This baffles me.  Can anyone think of examples where one might want
> differing results depending on your locale?
> 
> -Jack


i think i see what the reasoning is.  the thought is that, e.g.,
in spanish [a-z] should match ñ.  

the problem is this means that grep(regexp, data) now
returns a set of results, one for each locale.

so on the one hand, one would like [a-z] to do the Right Thing,
depending on language.  and on the other hand, one wants
grep(regexp, data) to return a single result.

i think the way to see through this issue is to notice that
the reason we want ñ to be in [a-z] is because of visual
similarity.  what if we were dealing with chinese?  i think
it's pretty clear that [a-z] should map to a contiguous set
of unicode codepoints.

if you want to deal with ñ, the unicode tables do note that ñ
is n+combining ~, so one could come up with a new
denotation for base codepoint.  unfortunately the combining
that with existing regexp would be a bit painful.

- erik

Re: [9fans] simplicity

Reply via email to