On 09/21/2010 01:49 AM, Paolo Bonzini wrote:
On 09/21/2010 02:37 AM, Eric Blake wrote:

Maybe the sed script in file.sed is non-portable? It's certainly more
complex than the normal run-of-the-mill sed script. Or maybe it is that
the regex '.' has problems matching non-characters, and the definition
of the various locales determine whether 8-bit bytes are characters or
not. Is there any portable way to guarantee a single-byte locale where
'.' matches all possible 8-bit bytes?

More testing shows that 'LC_ALL=en_US.ISO8859-1 sed' on Darwin gives the
desired results, so the problem is definitely a matter of whether the C
locale treats all 256 byte values as potential matches to '.'.

I think that's a (pretty serious) Darwin bug.

The bug is limited to GNU sed, which happened to be first in PATH on the machine where I reproduced the problem (and I'm guessing that the same thing happened to rochan):

$ printf '\200\n' | LC_ALL=C /usr/bin/sed -n /./p | wc -l
1
$ printf '\200\n' | LC_ALL=C sed -n /./p | wc -l
0
$ which sed
/usr/local/bin/sed
$ sed --version | head -n1
GNU sed version 4.2

It's nice that the system sed is immune, and I wonder what GNU sed is getting tripped up on? Maybe the autoconf fix is a matter of doing a best-tool search for a sed that handles 8-bit bytes, which would reject this broken GNU sed build in favor of the system sed, even with its other limitations?

--
Eric Blake   ebl...@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Reply via email to