Your message dated Tue, 24 Jun 2014 19:49:33 +0200
with message-id <20140624174933.GA25208@nomada>
and subject line Closing
has caused the Debian Bug report #387704,
regarding grep: -i breaks \W in some locales (perhaps UTF-8 locales only)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
387704: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=387704
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: grep
Version: 2.5.1.ds2-5
Severity: normal

I noticed that enabling --ignore-case suddenly caused certain patterns
not to match any longer although they should:

$ echo 'foo bar' | grep    '^foo\W'
foo bar
$ echo 'foo bar' | grep -i '^foo\W'
$

Digging further reveals that there's an locales influence since
$ echo 'foo bar' | LANG=C grep -i '^foo\W'
foo bar
$

matches again. After a check using all my generated locales:

MATCH:
- de_DE
- de_DE@euro
- en_US

FAIL:
- de_DE.UTF-8
- de_DE.UTF-8@euro
- en_US.UTF-8

there's a strong impression that UTF-8 locales somehow disturb \W when
using -i.

Even more confusing, using the bracket expression instead of the synonym
matches again:
$ echo 'foo bar' | LANG=de_DE.UTF-8 grep -i '^foo[^[:alnum:]]'
foo bar
$

For the records, this sounds somewhat similar to #209194 and #218873 but
these bugs are fixed in this version (2.5.1.ds2-5), I've checked.

By the way, there's a typo in the manpage

  and
  .B \eW
  is a synonym for
- .BR [^[:alnum]] .
+ .BR [^[:alnum:]] .
  .PP

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.17.13
Locale: LANG=de_DE.UTF-8@euro, LC_CTYPE=de_DE.UTF-8@euro (charmap=UTF-8)

Versions of packages grep depends on:
ii  libc6                        2.3.6.ds1-4 GNU C Library: Shared libraries

grep recommends no packages.

-- no debconf information

Attachment: signature.asc
Description: Digital signature


--- End Message ---
--- Begin Message ---
Version: 2.6.3-1

Hi,

I'm closing this bug since the issues with character classes and cases
ignored in multi-byte locales was fixed in grep 2.6.

$ echo 'foo bar' | LANG=C grep '^foo\W'; echo $?
foo bar

$ echo 'foo bar' | LANG=es_CO.UTF-8 grep '^foo\W'; echo $?
foo bar
0

Regards,

Santiago

--- End Message ---

Reply via email to