On Wed, Apr 23, 2014 at 10:39 PM, Paul Eggert <[email protected]> wrote: > Jim Meyering wrote: >> >> anyone using grep -P to search data that is even a tiny bit >> inconsistent with their UTF-8 locale will now get an exit status of >> 2 rather than the matches they used to get. > > > Yes, I don't like that either, but <http://bugs.exim.org/1468> says libpcre
Oh! I had not read that. That is disappointing. > intends to have undefined behavior here. If so, it wouldn't help to wait > until the next libprce release, which may well have a serious bug of this > form in a different area, a bug that's not easy to test for. Indeed. > Perhaps somebody should modify grep -P to discard input lines containing > non-UTF-8 data instead of presenting them to libprce. That way, it would be > safe for grep -P to use PCRE_NO_UTF8_CHECK. Although grep -P should report > an error and exit with status 2 if it discards input due to encoding errors, > it can also report matches in lines that do not contain encoding errors, so > that users can see both the error messages and the matches. That sounds reasonable, but I don't like the requirement that one make two passes over each subject text.
