Ilya Basin wrote:
> $ grep -i . greptest.txt
> aIabIbcIcdId$
>
> This doesn't happen without -i or with LANG=C
>
>
> $ grep --version
> grep (GNU grep) 2.7
> $ echo $LANG
> en_US.UTF-8
>
> pcre 8.10
>
> Linux IL 2.6.36-ARCH #1 SMP PREEMPT Wed Nov 24 06:44:11 UTC 2010 i686
> Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux
Thanks for the report. That is indeed a bug.
It affects even the very latest in git.
Here's another variant of it:
[note how it fails to print the matched "."]
$ i='\xC4\xB0'; printf "$i$i$i.$i$i$i$i\n" \
| LC_ALL=en_US.UTF-8 ./grep -oi '.\.'|od -a -tx1
0000000 D 0 nl
c4 b0 0a
0000003
-----------------------------
More like your example, this shows how, with -i,
grep is searching a different string (down-cased)
and the width of the lower-case "i" is just one byte.
The end-of-line offset is calculated using the all-lower-case
string, yet that offset is not valid in the original, longer string,
so grep fails to print the entire line:
i='\xC4\xB0'; printf "$i$i$i$i$i$i$i\n" |LC_ALL=en_US.UTF-8 ./grep -i ....
İİİİ
One of us should find time to fix it before too long.