Am 06.11.2016 um 22:27 schrieb Baptiste Daroussin <b...@freebsd.org>: > >> But under what circumstances would [A-Z] mean anything other than a >> character whose Unicode codepoint is between U+0041 and U+005A, inclusive? >> Especially given the locale in the example is en_US.UTF-8. Or, put another >> way, why would an implementation interpret [A-Z] as anything other than >> [ABCDE…XYZ]? > > The collation rules for unicode comes from: http://cldr.unicode.org/ and they > do > match the one on linux for example and the one on illumos. > > On some gnu tool they explicitly decide to be non locale aware to avoid that > kind of "surprises" >> >> From reading your reference, I can see in 9.3.5.7: >>> In the POSIX locale, a range expression represents the set of collating >>> elements that fall between two elements in the collation sequence, >>> inclusive. In other locales, a range expression has unspecified behavior[…] >> >> So even if the observed behaviour is conforming, I’d think it’s still highly >> undesirable. >> > That works for POSIX locale aka C aka ASCII only world
So what do I set my LANG and LC variables to? I do want UTF-8, but I do also want my scripts to continue to work. Clearly, en_US.UTF-8 is not what I want. Is it C.UTF-8? Or do I set LANG=en_US.UTF-8 and LC_COLLATE=C? Stefan -- Stefan Bethke <s...@lassitu.de> Fon +49 151 14070811 _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"