Paolo Bonzini wrote: > This is the patch attached to https://bugzilla.redhat.com/683753 > and http://savannah.gnu.org/patch/?3934, with testcases. > > Paolo > > Paolo Bonzini (1): > tests: include UTF-8 testcases for grep -P > > Petr Pisar (1): > pcresearch: set UTF-8 flag correctly for UTF-8 locales > > NEWS | 6 ++++++ > src/pcresearch.c | 8 ++++++++ > tests/Makefile.am | 1 + > tests/pcre-utf8 | 33 +++++++++++++++++++++++++++++++++ > 4 file modificati, 48 inserzioni(+) > create mode 100755 tests/pcre-utf8
Thanks for the quick work, Paolo. I will push this follow-on patch shortly, along with one more to factor out the now-duplicate STREQ definition. >From 9df414a75f101a1f7f25c5850d5cfc2e242f6ff8 Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Wed, 3 Oct 2012 12:08:31 +0200 Subject: [PATCH] maint: correct syntax-check failures; adjust NEWS * tests/pcre-utf8: Reverse order of compare arguments. Remove all copyright year numbers except 2012. Use skip_ "diagnostic...", rather than a bare "exit 77". * NEWS: Start with a concise description of the bug. * src/pcresearch.c (STREQ): Define, so that we can... (Pcompile): use STREQ, not strcmp. --- NEWS | 9 +++++---- src/pcresearch.c | 4 +++- tests/pcre-utf8 | 13 +++++++------ 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index bc669b9..052cd81 100644 --- a/NEWS +++ b/NEWS @@ -4,10 +4,11 @@ GNU grep NEWS -*- outline -*- ** Bug fixes - While multi-byte mode is only supported by PCRE with UTF-8 locales, - grep did not activate it. This can cause failures to match multibyte - characters against some regular expressions, especially those including - the '.' or '\p' metacharacters. + grep -P could misbehave. While multi-byte mode is only supported by PCRE + with UTF-8 locales, grep did not activate it. This would cause failures + to match multibyte characters against some regular expressions, especially + those including the '.' or '\p' metacharacters. + * Noteworthy changes in release 2.14 (2012-08-20) [stable] diff --git a/src/pcresearch.c b/src/pcresearch.c index 3539b58..a15f598 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -29,6 +29,8 @@ # include <langinfo.h> #endif +#define STREQ(a, b) (strcmp (a, b) == 0) + #if HAVE_LIBPCRE /* Compiled internal form of a Perl regular expression. */ static pcre *cre; @@ -55,7 +57,7 @@ Pcompile (char const *pattern, size_t size) char const *pnul; #if defined HAVE_LANGINFO_CODESET - if (!strcmp(nl_langinfo(CODESET), "UTF-8")) + if (STREQ (nl_langinfo (CODESET), "UTF-8")) flags |= PCRE_UTF8; #endif diff --git a/tests/pcre-utf8 b/tests/pcre-utf8 index b86b114..04146ec 100755 --- a/tests/pcre-utf8 +++ b/tests/pcre-utf8 @@ -1,7 +1,7 @@ #! /bin/sh # Ensure that, with -P, Unicode \p{} symbols are correctly matched. # -# Copyright (C) 2001, 2006, 2009-2012 Free Software Foundation, Inc. +# Copyright (C) 2012 Free Software Foundation, Inc. # # Copying and distribution of this file, with or without modification, # are permitted in any medium without royalty provided the copyright @@ -13,21 +13,22 @@ require_en_utf8_locale_ fail=0 -echo '$' | LC_ALL=en_US.UTF-8 grep -qP '\p{S}' || exit 77 +echo '$' | LC_ALL=en_US.UTF-8 grep -qP '\p{S}' \ + || skip_ 'PCRE support is compiled out' euro='\xe2\x82\xac euro' printf "$euro\\n" > in || framework_failure_ LC_ALL=en_US.UTF-8 grep -P '^\p{S}' in > out || fail=1 -compare out in || fail=1 +compare in out || fail=1 LC_ALL=en_US.UTF-8 grep -P '^. euro$' in > out2 || fail=1 -compare out2 in || fail=1 +compare in out2 || fail=1 LC_ALL=en_US.UTF-8 grep -oP '. euro' in > out3 || fail=1 -compare out3 in || fail=1 +compare in out3 || fail=1 LC_ALL=en_US.UTF-8 grep -P '^\P{S}' in > out4 -compare out4 /dev/null || fail=1 +compare /dev/null out4 || fail=1 Exit $fail -- 1.7.12.1.382.gb0576a6
