URL:
<http://savannah.gnu.org/bugs/?19637>
Summary: \w and [[:alnum:]] not equivalent in multibyte
locale
Project: grep
Submitted by: None
Submitted on: Freitag 20.04.2007 um 08:26 UTC
Category: None
Severity: 3 - Normal
Item Group: None
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
In grep's manpage I found:
The symbol \w is a synonym for [[:alnum:]]
This is true in a single-byte locale
(terminal = xterm -en iso-8859-1):
echo -e "a\nä\n" | LC_ALL=de_DE ./grep -E '\w'
a
ä
echo -e "a\nä\n_" | LC_ALL=de_DE ./grep -E '[[:alnum:]]'
a
ä
But not in an utf-8 locale
(terminal xterm -u8):
echo -e "a\nä" | LC_ALL=de_DE.utf8 ./grep -P '\w'
a
echo -e "a\nä\n_" | LC_ALL=de_DE.utf8 ./grep -E '[[:alnum:]]'
a
ä
System:
openSUSE 10.2
grep from CVS
[volga:src] uname -a
Linux volga 2.6.18.8-0.1-default #1 SMP Fri Mar 2 13:51:59 UTC 2007 i686 i686
i386 GNU/Linux
[volga:src] locale -V
locale (GNU libc) 2.5
...
[volga:src] ./grep -V
GNU grep 2.5.1-cvs
...
Thanks, Sebastian
(wastl[]cis.uni-muenchen.de)
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?19637>
_______________________________________________
Nachricht geschickt von/durch Savannah
http://savannah.gnu.org/