On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <at...@pld-linux.org> wrote:
> I've just happened to notice a difference in behavior between sed 4.5 and 4.6
> when building VirtualBox. It seems to be locale dependent:
> $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(bar
> $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(foo *
> In 4.5 both results are the same -- same as the second output with

Thanks a lot for that report.
This is indeed a regression. It also affects the just-release
grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
To back that out, I must first revert part of this fix-up patch:

Here's a demonstrator with grep: (it should match, but with 3.2, does not):

$ echo 123-x|LC_ALL=C grep '.\bx'

To avoid the failure, one can:
- specify -P (for PCRE, a different matcher), or
- don't use the C locale, but rather use a multi-byte locale like the
one you chose, which inhibits use of the DFA matcher, because \b's
definition requires multi-byte aware machinery not present in the DFA

I expect to revert the mentioned mentioned gnulib commits, and then to
make new releases of both grep and sed.

Reply via email to