> Yes, old thread, sorry. Blame Uriel. > > On 9/18/07, Douglas A. Gwyn <[EMAIL PROTECTED]> wrote: > > erik quanstrom wrote: > > > suppose Linux user a and user b grep the same "text" file for the same > > > string. > > > results will depend on the users' locales. > > > > But if they're trying to match an alphabetic character class, the > > result *should* depend on the locale. > > This baffles me. Can anyone think of examples where one might want > differing results depending on your locale? > > -Jack
i think i see what the reasoning is. the thought is that, e.g., in spanish [a-z] should match ñ. the problem is this means that grep(regexp, data) now returns a set of results, one for each locale. so on the one hand, one would like [a-z] to do the Right Thing, depending on language. and on the other hand, one wants grep(regexp, data) to return a single result. i think the way to see through this issue is to notice that the reason we want ñ to be in [a-z] is because of visual similarity. what if we were dealing with chinese? i think it's pretty clear that [a-z] should map to a contiguous set of unicode codepoints. if you want to deal with ñ, the unicode tables do note that ñ is n+combining ~, so one could come up with a new denotation for base codepoint. unfortunately the combining that with existing regexp would be a bit painful. - erik
