Package: grep Version: 2.6.3-3 Severity: normal
It seems that grep misclassifies combining letters (unicode class Lm) as punctuation, when they should be letters. For instance: $ echo d̪ʌ̀lì | grep -o '[[:alpha:]]*' d ʌ li As a consequence, combining accents are not seen as "word-constituent": $ echo d̪ʌ̀lì | grep -o '\w*' d ʌ li This causes also false positives on word-boundary conditions, such as the below: $ echo d̪ʌ̀lì | grep -w ʌ d̪ʌ̀lì I suggest that combining letters should be part of [:alpha:] instead of [:punct:]. -- System Information: Debian Release: 6.0.4 APT prefers stable APT policy: (990, 'stable'), (10, 'oldstable'), (10, 'testing') Architecture: i386 (i686) Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages grep depends on: ii dpkg 1.15.8.12 Debian package management system ii install-info 4.13a.dfsg.1-6 Manage installed documentation in ii libc6 2.11.3-2 Embedded GNU C Library: Shared lib grep recommends no packages. Versions of packages grep suggests: ii libpcre3 8.02-1.1 Perl 5 Compatible Regular Expressi -- no debconf information -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

