Package: grep
Version: 2.6.3-3
Severity: normal

It seems that grep misclassifies combining letters (unicode class Lm) as
punctuation, when they should be letters.  For instance:

$ echo d̪ʌ̀lì | grep -o '[[:alpha:]]*'
d
ʌ
li

As a consequence, combining accents are not seen as "word-constituent":

$ echo d̪ʌ̀lì | grep -o '\w*'
d
ʌ
li

This causes also false positives on word-boundary conditions, such as
the below:

$ echo d̪ʌ̀lì | grep -w ʌ
d̪ʌ̀lì

I suggest that combining letters should be part of [:alpha:] instead of
[:punct:].

-- System Information:
Debian Release: 6.0.4
  APT prefers stable
  APT policy: (990, 'stable'), (10, 'oldstable'), (10, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages grep depends on:
ii  dpkg                      1.15.8.12      Debian package management system
ii  install-info              4.13a.dfsg.1-6 Manage installed documentation in 
ii  libc6                     2.11.3-2       Embedded GNU C Library: Shared lib

grep recommends no packages.

Versions of packages grep suggests:
ii  libpcre3                      8.02-1.1   Perl 5 Compatible Regular Expressi

-- no debconf information



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to