Jim Meyering wrote: > Ilya Basin wrote: >> $ grep -i . greptest.txt >> aIabIbcIcdId$ >> >> This doesn't happen without -i or with LANG=C >> >> >> $ grep --version >> grep (GNU grep) 2.7 >> $ echo $LANG >> en_US.UTF-8 >> >> pcre 8.10 >> >> Linux IL 2.6.36-ARCH #1 SMP PREEMPT Wed Nov 24 06:44:11 UTC 2010 i686 >> Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux > > Thanks for the report. That is indeed a bug. > It affects even the very latest in git. > > Here's another variant of it: > [note how it fails to print the matched "."] > > $ i='\xC4\xB0'; printf "$i$i$i.$i$i$i$i\n" \ > | LC_ALL=en_US.UTF-8 ./grep -oi '.\.'|od -a -tx1 > 0000000 D 0 nl > c4 b0 0a > 0000003 > > ----------------------------- > More like your example, this shows how, with -i, > grep is searching a different string (down-cased) > and the width of the lower-case "i" is just one byte. > The end-of-line offset is calculated using the all-lower-case > string, yet that offset is not valid in the original, longer string, > so grep fails to print the entire line: > > i='\xC4\xB0'; printf "$i$i$i$i$i$i$i\n" |LC_ALL=en_US.UTF-8 ./grep -i .... > İİİİ > > One of us should find time to fix it before too long.
First step is (at least this time) to write the test. I've just pushed this: >From 955695aea8fac194db07009a8673af3aaa6e0f8c Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Wed, 19 Jan 2011 22:12:09 +0100 Subject: [PATCH 1/2] maint: sort test names in Makefile.am * tests/Makefile.am (TESTS): Sort test names. --- tests/Makefile.am | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/Makefile.am b/tests/Makefile.am index ac0e3c1..0d78d26 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -35,9 +35,9 @@ endif TESTS = \ backref \ + backref-multibyte-slow \ backref-word \ bre \ - backref-multibyte-slow \ case-fold-backref \ case-fold-backslash-w \ case-fold-char-class \ @@ -46,8 +46,8 @@ TESTS = \ char-class-multibyte \ dfaexec-multibyte \ empty \ - ere \ equiv-classes \ + ere \ euc-mb \ fedora \ fgrep-infloop \ @@ -65,15 +65,15 @@ TESTS = \ options \ pcre \ pcre-z \ + prefix-of-multibyte \ reversed-range-endpoints \ sjis-mb \ spencer1 \ spencer1-locale \ status \ - prefix-of-multibyte \ warn-char-classes \ - word-multi-file \ word-delim-multibyte \ + word-multi-file \ yesno EXTRA_DIST = \ -- 1.7.3.5 >From ebfc46553d56ec3ab3feade82e53fac0863fd102 Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Wed, 19 Jan 2011 22:12:43 +0100 Subject: [PATCH 2/2] tests: add a known-to-fail test * tests/turkish-I: New test. * tests/Makefile.am (TESTS): Add it. (XFAIL_TESTS): Add here, too. Reported by Ilya Basin. --- THANKS | 1 + tests/Makefile.am | 2 ++ tests/turkish-I | 32 ++++++++++++++++++++++++++++++++ 3 files changed, 35 insertions(+), 0 deletions(-) create mode 100755 tests/turkish-I diff --git a/THANKS b/THANKS index 8c3d0d9..116b9c4 100644 --- a/THANKS +++ b/THANKS @@ -37,6 +37,7 @@ H. Merijn Brand <[email protected]> Harald Hanche-Olsen <[email protected]> Hans-Bernhard Broeker <[email protected]> Heikki Korpela <[email protected]> +Ilya Basin <[email protected]> Isamu Hasegawa <[email protected]> Jaroslav Škarvada <[email protected]> Jeff Bailey <[email protected]> diff --git a/tests/Makefile.am b/tests/Makefile.am index 0d78d26..7233c01 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -32,6 +32,7 @@ XFAIL_TESTS = \ if USE_INCLUDED_REGEX XFAIL_TESTS += equiv-classes endif +XFAIL_TESTS += turkish-I TESTS = \ backref \ @@ -71,6 +72,7 @@ TESTS = \ spencer1 \ spencer1-locale \ status \ + turkish-I \ warn-char-classes \ word-delim-multibyte \ word-multi-file \ diff --git a/tests/turkish-I b/tests/turkish-I new file mode 100755 index 0000000..ac536c4 --- /dev/null +++ b/tests/turkish-I @@ -0,0 +1,32 @@ +#!/bin/sh +# grep -i in UTF-8: missing NL in output on line containing I WITH DOT (U+0130) + +# Copyright (C) 2011 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src + +require_en_utf8_locale_ + +fail=0 + +i='\xC4\xB0' +printf "$i$i$i$i$i$i$i\n" > in || framework_failure_ + +LC_ALL=en_US.UTF-8 grep -i .... in > out || fail=1 + +compare out in || fail=1 + +Exit $fail -- 1.7.3.5
