Re: Uppercase RE matching problems in FreeBSD 11

Stefan Bethke Sun, 06 Nov 2016 13:50:33 -0800

Am 06.11.2016 um 22:27 schrieb Baptiste Daroussin <b...@freebsd.org>:
> 
>> But under what circumstances would [A-Z] mean anything other than a 
>> character whose Unicode codepoint is between U+0041 and U+005A, inclusive?  
>> Especially given the locale in the example is en_US.UTF-8.  Or, put another 
>> way, why would an implementation interpret [A-Z] as anything other than 
>> [ABCDE…XYZ]?
> 
> The collation rules for unicode comes from: http://cldr.unicode.org/ and they 
> do
> match the one on linux for example and the one on illumos.
> 
> On some gnu tool they explicitly decide to be non locale aware to avoid that
> kind of "surprises"
>> 
>> From reading your reference, I can see in 9.3.5.7:
>>> In the POSIX locale, a range expression represents the set of collating 
>>> elements that fall between two elements in the collation sequence, 
>>> inclusive. In other locales, a range expression has unspecified behavior[…]
>> 
>> So even if the observed behaviour is conforming, I’d think it’s still highly 
>> undesirable.
>> 
> That works for POSIX locale aka C aka ASCII only world


So what do I set my LANG and LC variables to?  I do want UTF-8, but I do also 
want my scripts to continue to work.  Clearly, en_US.UTF-8 is not what I want.  
Is it C.UTF-8?  Or do I set LANG=en_US.UTF-8 and LC_COLLATE=C?


Stefan

-- 
Stefan Bethke <s...@lassitu.de>   Fon +49 151 14070811




_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Uppercase RE matching problems in FreeBSD 11

Reply via email to