Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes

Assaf Gordon Thu, 23 Feb 2017 20:27:26 -0800


Hello,


Picking up an old sed bug:

Matija Nalis wrote:

echo 'a^c' | sed -e 's/\x5e/b/'
should produce output "abc" (as it does in "ssed" or "perl -pe"), butin GNU sed it produces "ba^c".


I agree this is indeed GNU sed's behavior,

but it's not clear it is incorrect behavior (or if there isa correct one).


GNU sed internally converts escape seqeunces to their
corresponding characters *before* passing it on
to glibc's regex engine (or doing other operations).

It is glibc's regex that does not support unescaping,
which is why this is needed.

Few additional cases:

Un-escaping works in other sed commands:

   $ echo 'a^c' | sed 'y/\x5e/b/'
   abc

And gnu grep, which doesn't unescape, passes the string
to the regex engine, which then doesn't match at all:

   $ echo 'a^c' | gnu-grep-2.26 '\x5e'

As opposed to freebsd's grep:

   $ echo 'a^c' | freebsd-grep-2.5.1 '\x5e'
   a^c

Many other sed implementations don't support unescaping
at all, and return:

   $ echo 'a^c' | sed 's/\x5e/b/'
   a^c
   $ echo 'a^c' | sed 'y/\x5e/b/'
   sed: 1: "y/\x5e/b/": transform strings are not the same length

This is the case for sed from FreeBSD, OpenBSD, BusyBox, ToyBox.

Lastly, POSIX does not mention anything about '\xXX' escape
sequences, so this is left to implementors to decide.

As such, I would suggest marking #605142 as notabug and closing it.

I will add some info about it to sed's manual.

If someone wants to suggest this as a new feature,
please write to sed-de...@gnu.org .
However since this breaks existing behavior, the bar to accepting
it might be high.

regards,
- assaf

Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes

Reply via email to