On Mon, Apr 21, 2014 at 11:03 AM, Paul Eggert <[email protected]> wrote: > On 04/16/2014 05:13 AM, Norihiro Tanaka wrote: >> >> http://bugs.exim.org/show_bug.cgi?id=1468 > > > Thanks. The response there makes it clear that if grep passes arbitrary > binary data to PCRE, and if grep uses PCRE_NO_UTF8_CHECK, undefined behavior > will result (maybe infinite loop, core dump, etc.). We can't have undefined > behavior in grep. A simple fix is to avoid using PCRE_NO_UTF8_CHECK so I > installed the attached patch to do that. Perhaps we can think of a better > way at some point. In the meantime I'm taking the liberty of closing > Bug#17245 and Bug#16586.
Thanks for the patch, but I'm not sure I like the consequences: that anyone using grep -P to search data that is even a tiny bit inconsistent with their UTF-8 locale will now get an exit status of 2 rather than the matches they used to get. I would prefer to test for working PCRE support and disable -P if it is deemed inadequate, but that may have to wait for the release of a new version of libpcre. In any case, I found that this additional change is required, at least on OS/X, to avoid a test failure:
From b80a95691418ce19b42b54c706633ef8be0bd9ee Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Wed, 23 Apr 2014 19:21:11 -0700 Subject: [PATCH] tests: use consistent spelling for locale name, en_US.UTF-8 * tests/pcre-infloop: Spell locale name, en_US.UTF-8, consistently, converting this one use from "en_US.utf8", which would provoke a test failure on OS/X. --- tests/pcre-infloop | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/pcre-infloop b/tests/pcre-infloop index febf356..1b33e72 100755 --- a/tests/pcre-infloop +++ b/tests/pcre-infloop @@ -27,7 +27,7 @@ printf 'a\201b\r' > in || framework_failure_ fail=0 -LC_ALL=en_US.utf8 timeout 3 grep -P 'a.?..b' in +LC_ALL=en_US.UTF-8 timeout 3 grep -P 'a.?..b' in test $? = 2 || fail_ "libpcre's match function appears to infloop" Exit $fail -- 1.9.2.459.g68773ac
