bug#16586: bug#17245: GREP BUG: grep -P and binary files

Jim Meyering Thu, 24 Apr 2014 18:48:13 -0700

On Wed, Apr 23, 2014 at 10:39 PM, Paul Eggert <[email protected]> wrote:
> Jim Meyering wrote:
>>
>> anyone using grep -P to search data that is even a tiny bit
>> inconsistent with their UTF-8 locale will now get an exit status of
>> 2 rather than the matches they used to get.
>
>
> Yes, I don't like that either, but <http://bugs.exim.org/1468> says libpcre


Oh! I had not read that. That is disappointing.

> intends to have undefined behavior here.  If so, it wouldn't help to wait
> until the next libprce release, which may well have a serious bug of this
> form in a different area, a bug that's not easy to test for.

Indeed.

> Perhaps somebody should modify grep -P to discard input lines containing
> non-UTF-8 data instead of presenting them to libprce.  That way, it would be
> safe for grep -P to use PCRE_NO_UTF8_CHECK.  Although grep -P should report
> an error and exit with status 2 if it discards input due to encoding errors,
> it can also report matches in lines that do not contain encoding errors, so
> that users can see both the error messages and the matches.

That sounds reasonable, but I don't like the requirement that
one make two passes over each subject text.

bug#16586: bug#17245: GREP BUG: grep -P and binary files

Reply via email to