Jim Meyering <j...@meyering.net> wrote: > On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <at...@pld-linux.org> wrote: > > I've just happened to notice a difference in behavior between sed 4.5 and > > 4.6 > > when building VirtualBox. It seems to be locale dependent: > > > > $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g' > > foo(bar > > > > $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g' > > foo(foo * > > > > In 4.5 both results are the same -- same as the second output with > > LC_ALL=C.UTF-8. > > Thanks a lot for that report. > This is indeed a regression. It also affects the just-release > grep-3.2, since the source is in a file used by both: gnulib's dfa.c. > I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28 > To back that out, I must first revert part of this fix-up patch: > v0.1-2281-g95cd86dd7 > > Here's a demonstrator with grep: (it should match, but with 3.2, does not): > > $ echo 123-x|LC_ALL=C grep '.\bx' > $ > > To avoid the failure, one can: > - specify -P (for PCRE, a different matcher), or > - don't use the C locale, but rather use a multi-byte locale like the > one you chose, which inhibits use of the DFA matcher, because \b's > definition requires multi-byte aware machinery not present in the DFA > matcher. > > I expect to revert the mentioned mentioned gnulib commits, and then to > make new releases of both grep and sed.
Please add a test case ... THanks, Arnold