I've recently done some bug-report maintenance about a set of GNU grep bug reports related to whether whether "grep -P '\d'" should match non-ASCII digits, and have some thoughts about coordinating GNU grep with git grep in this department.

GNU Bug#62605[1] "`[\d]` does not work with PCRE" has been fixed on Savannah's copy of GNU grep, and some sort of fix should appear in the next grep release. However, I'm leaving the GNU grep bug report open for now because it's related to Bug#60690[2] "[PATCH v2] grep: correctly identify utf-8 characters with \{b,w} in -P" and to Bug#62552[3] "Bug found in latest stable release v3.10 of grep". I merged these related bug reports, and the oldest one, Bug#60690, is now the representative displayed in the GNU grep bug list[4].

For this set of grep bug reports there's still a pending issue discussed in my recent email[5], which proposes a patch so I've tagged Bug#60690 with "patch". The proposal is that GNU grep -P '\d' should revert to the grep 3.9 behavior, i.e., that in a UTF-8 locale, \d should also match non-ASCII decimal digits.

In researching this a bit further, I found that on March 23 Git disabled the use of PCRE2_UCP in PCRE2 10.34 or earlier[6], due to a PCRE2 bug that can cause a crash when PCRE2_UCP is used[7]. A bug fix[8] should appear in the next PCRE2 release.

When PCRE2 10.35 comes out, it appears that 'git grep -P' will behave like 'grep -P' only if GNU grep adopts something like the solution proposed in [5].

[1]: https://bugs.gnu.org/62605
[2]: https://bugs.gnu.org/60690
[3]: https://bugs.gnu.org/62552
[4]: https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep
[5]: https://lists.gnu.org/archive/html/grep-devel/2023-04/msg00004.html
[6]: https://github.com/git/git/commit/14b9a044798ebb3858a1f1a1377309a3d6054ac8 [7]: https://lore.kernel.org/git/7e83daa1-f9a9-4151-8d07-d80ea6d59...@clumio.com/ [8]: https://github.com/git/git/commit/14b9a044798ebb3858a1f1a1377309a3d6054ac8



Reply via email to