Re: Uppercase RE matching problems in FreeBSD 11

Mark Martinec Sun, 06 Nov 2016 04:28:07 -0800

2016-11-06 12:07, Baptiste Daroussin wrote:

Yes A-Z only means uppercase in an ASCII only world in a unicode worldit meansAaBb... Z because there are way more characters that simple A-Z. InFreeBSD 11we have a unicode collation instead of falling back in on LC_COLLATE=Cwhich
means ascii only
For regrexp for example one should use the classes: :upper: or :lower:.

It is a good idea to keep LC_COLLATE and LC_NUMERIC (and LC_MONETARY?)at "C"

when LANG or LC_CTYPE is set to something else, otherwise unexpected
things may happen.

  Mark

On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote:
I happened to run an old script today that uses sed(1) to extract thesystemboot time from the kern.boottime sysctl MIB. On 11.0 this no longerworks as
expected:

$ sysctl kern.boottime
kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:342016
$ sysctl kern.boottime | sed -e 's/.*$[A-Z].*$$/\1/'
v  5 16:18:34 2016
sed passes over 'S' and 'N' until it hits 'v', which it considersuppercaseapparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it worksas
expected:

$ sysctl kern.boottime | LANG=C sed -e 's/.*$[A-Z].*$$/\1/'
Nov  5 16:18:34 2016
Testing every lowercase character separately gives even moreinconsistent
results:

$ cat <<! | LANG=en_US.UTF-8 sed -n -e '/^[A-Z]$/'p
> a
> b
> c
> d
> e
> f
> g
> h
> i
> j
> k
> l
> m
> n
> o
> p
> q
> r
> s
> t
> u
> v
> w
> x
> y
> z
> !
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
Here sed thinks every lowercase character except for 'a' is uppercase!Thisdiffers from the first test where sed did not think 'o' is uppercase.Again,
the above behaves as expected with LANG=C.
Does anyone have any insight into this? This is likely to break a lotof
existing code.

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Uppercase RE matching problems in FreeBSD 11

Reply via email to