2016-11-06 12:07, Baptiste Daroussin wrote:
Yes A-Z only means uppercase in an ASCII only world in a unicode world it means AaBb... Z because there are way more characters that simple A-Z. In FreeBSD 11 we have a unicode collation instead of falling back in on LC_COLLATE=C which
means ascii only

For regrexp for example one should use the classes: :upper: or :lower:.

It is a good idea to keep LC_COLLATE and LC_NUMERIC (and LC_MONETARY?) at "C"
when LANG or LC_CTYPE is set to something else, otherwise unexpected
things may happen.

  Mark


On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote:
I happened to run an old script today that uses sed(1) to extract the system boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works as
expected:

$ sysctl kern.boottime
kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016
$ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/'
v  5 16:18:34 2016

sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as
expected:

$ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/'
Nov  5 16:18:34 2016

Testing every lowercase character separately gives even more inconsistent
results:

$ cat <<! | LANG=en_US.UTF-8 sed -n -e '/^[A-Z]$/'p
> a
> b
> c
> d
> e
> f
> g
> h
> i
> j
> k
> l
> m
> n
> o
> p
> q
> r
> s
> t
> u
> v
> w
> x
> y
> z
> !
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

Here sed thinks every lowercase character except for 'a' is uppercase! This differs from the first test where sed did not think 'o' is uppercase. Again,
the above behaves as expected with LANG=C.

Does anyone have any insight into this? This is likely to break a lot of
existing code.
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to