* Zorro Lang: > On Wed, Jul 18, 2018 at 08:04:05AM +0200, Florian Weimer wrote: >> * Zorro Lang: >> >> >> > > This is related to this glibc bug: >> >> > > >> >> > > https://sourceware.org/bugzilla/show_bug.cgi?id=23393 >> >> >> >> > A stranger thing is: >> > egrep [A-Z] match ABCD and bcd, but not match 'a'... >> >> That's the same issue as [0-9] not matching ９. >> >> > I already can't understand the new rules ... >> >> The range operator matches characters according to their collation >> weight, and sincce the weight of 'a' is less than the weight of 'A', >> 'a' is not included in the [A-Z] range. > > How to define/calculate the *weight* in your context? Why you say the > weight of 'a' is less than the weight of 'A'
This is a concept from POSIX collation, based on a locale definition: I hope this link is reasonably stable: <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_02> Basically, collation is in alternative way of sorting strings, different from codepoint order, and it is specifically designed to take cultural conventions into account. Traditionally, most regular expression range expression such as [a-z] follow collation order, although this is not required by POSIX for non-C/non-POSIX locales. >> This could be fixed by including all characters with the same primary >> weight as the endpoints (so that [ā-ẑ] and [a-z] would end up being >> the same). It makes the behavior more logical, but it doesn't fix >> existing scripts. > > We find that the $LANG will affect how glibc deal with the wildcard. > We all test on LANG=en_US.UTF=8, but if I set export LANG=C, then > [a-z] and [A-Z] are all as expected, and xfstests make install works. Right, this is expected: POSIX requires the behavior you need for the "C" locale. _______________________________________________ Bug-make mailing list Bugfirstname.lastname@example.org https://lists.gnu.org/mailman/listinfo/bug-make