[email protected] wrote: >> echo Y | LC_ALL=en_US.UTF-8 ./grep -i '[y]' > > I think gawk dfa fixes this. It rings a vague bell....
That one at least is fixed by syncing from gawk's dfa.c. Here's the patch I've just written. Debian's 61-dfa.c-case_fold-charclass.patch had many superfluous casts, but appeared to be semantically equivalent to the dfa.c change below. >From 4a0f966463ed44e90958aa75f048dace7edd3649 Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Thu, 4 Mar 2010 22:23:06 +0100 Subject: [PATCH] fix a bug in handling of -i and character classes * dfa.c (parse_bracket_exp_mb): Sync from gawk's dfa.c. * tests/case-fold-char-class: New file. Test for the bug. * tests/Makefile.am (TESTS): Add it. (TESTS_ENVIRONMENT): Propagate LOCALE_FR and LOCALE_FR_UTF8 definitions into tests. * NEWS (Bug fixes): Mention it. --- NEWS | 3 ++ src/dfa.c | 7 +++++ tests/Makefile.am | 57 +++++++++++++++++++++++-------------------- tests/case-fold-char-class | 14 ++++++++++ 4 files changed, 54 insertions(+), 27 deletions(-) create mode 100644 tests/case-fold-char-class diff --git a/NEWS b/NEWS index 70881c7..6685967 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,9 @@ GNU grep NEWS -*- outline -*- ** Bug fixes + grep -i with a character class would malfunction in multi-byte locales. + For example, echo Y | LC_ALL=en_US.UTF-8 grep -i '[y]' would print nothing. + grep would mistakenly exit with status 1 upon error, rather than 2, as it is documented to do. diff --git a/src/dfa.c b/src/dfa.c index 60ec372..09c0c96 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -654,6 +654,13 @@ parse_bracket_exp_mb (void) REALLOC_IF_NECESSARY(work_mbc->chars, wchar_t, chars_al, work_mbc->nchars + 1); work_mbc->chars[work_mbc->nchars++] = (wchar_t)wc; + if (case_fold && (iswlower(wc) || iswupper(wc))) + { + REALLOC_IF_NECESSARY(work_mbc->chars, wchar_t, chars_al, + work_mbc->nchars + 1); + work_mbc->chars[work_mbc->nchars++] = + (wchar_t) (iswlower(wc) ? towupper(wc) : towlower(wc)); + } } } while ((wc = wc1) != L']'); diff --git a/tests/Makefile.am b/tests/Makefile.am index cee1fa4..276209d 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -14,35 +14,36 @@ # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. -TESTS = \ - backref.sh \ - bre.sh \ - empty.sh \ - ere.sh \ - file.sh \ - fmbtest.sh \ - foad1.sh \ - help-version \ - khadafy.sh \ - max-count-vs-context \ - options.sh \ - pcre.sh \ - spencer1.sh \ - status.sh \ - warning.sh \ - word-multi-file \ +TESTS = \ + backref.sh \ + bre.sh \ + case-fold-char-class \ + empty.sh \ + ere.sh \ + file.sh \ + fmbtest.sh \ + foad1.sh \ + help-version \ + khadafy.sh \ + max-count-vs-context \ + options.sh \ + pcre.sh \ + spencer1.sh \ + status.sh \ + warning.sh \ + word-multi-file \ yesno.sh -EXTRA_DIST = \ - $(TESTS) \ - bre.awk \ - bre.tests \ - ere.awk \ - ere.tests \ - init.sh \ - khadafy.lines \ - khadafy.regexp \ - spencer1.awk \ +EXTRA_DIST = \ + $(TESTS) \ + bre.awk \ + bre.tests \ + ere.awk \ + ere.tests \ + init.sh \ + khadafy.lines \ + khadafy.regexp \ + spencer1.awk \ spencer1.tests CLEANFILES = \ @@ -69,6 +70,8 @@ TESTS_ENVIRONMENT = \ fi; \ }; \ export \ + LOCALE_FR='$(LOCALE_FR)' \ + LOCALE_FR_UTF8='$(LOCALE_FR_UTF8)' \ AWK=$(AWK) \ GREP=$(top_builddir)/src/grep \ GREP_OPTIONS='' \ diff --git a/tests/case-fold-char-class b/tests/case-fold-char-class new file mode 100644 index 0000000..c36b314 --- /dev/null +++ b/tests/case-fold-char-class @@ -0,0 +1,14 @@ +#!/bin/sh +# This would fail for grep-2.5.3 +: ${srcdir=.} +. "$srcdir/init.sh"; path_prepend_ ../src + +printf 'Y\n' > exp || framework_failure +fail=0 + +for LOC in en_US.UTF-8 zh_CN $LOCALE_FR_UTF8; do + printf 'X\nY\nZ\n' | LC_ALL=$LOC grep -i '[y]' > out || fail=1 + compare out exp || fail=1 +done + +Exit $fail -- 1.7.0.1.300.gd855a
