Re: Why is busybox grep matching ^SOL after NUL?

Bernhard Reutner-Fischer Tue, 18 Oct 2022 05:28:22 -0700

On Tue, 18 Oct 2022 05:45:02 -0500
Rob Landley <[email protected]> wrote:


> $ echo -e 'one\0two' | busybox grep -l ^t
> (standard input)

/* BB_AUDIT GNU defects - always acts as -a.  */

$ man grep | grep -A5 "^\s*-z,"
       -z, --null-data
              Treat  input  and  output  data  as  sequences  of  lines,  each
              terminated by a zero byte (the ASCII NUL character) instead of a
              newline.   Like the -Z or --null option, this option can be used
              with commands like sort -z to process arbitrary file names.

$ echo -e "one\0two" | ./busybox grep -l ^t
$ echo -e "one\0two" | ./busybox grep -la ^t
$ echo -e "one\0two" | ./busybox grep -laz ^t
(standard input)
$ grep --version | head -n1
grep (GNU grep) 3.8
$ echo -e "one\0two" | grep -l ^t
(standard input)
$ echo -e "one\0two" | grep -la ^t
$ echo -e "one\0two" | grep -laz ^t
(standard input)

So... why does grep -l match while busybox grep -l does not?
It seems that GNU/the-fabulous grep defaults to --binary-files=binary:
$ echo -e "one\0two" | grep -l --binary-files=text ^t
$ echo -e "one\0two" | grep -l --binary-files=binary ^t
(standard input)
which is what we see above, i think.

>From the GNU/the-very-best grep:
       --binary-files=TYPE
              If a file's data or metadata indicate  that  the  file  contains
              binary  data,  assume  that  the file is of type TYPE.  Non-text
              bytes indicate binary data; these are either output  bytes  that
              are  improperly  encoded  for  the current locale, or null input
              bytes when the -z option is not given.

              By default, TYPE is binary, and  grep  suppresses  output  after
              null  input  binary  data  is  discovered, and suppresses output
              lines that contain improperly encoded data.  When some output is
              suppressed,  grep  follows any output with a message to standard
              error saying that a binary file matches.

              If TYPE is without-match, when grep discovers null input  binary
              data  it  assumes that the rest of the file does not match; this
              is equivalent to the -I option.

              If TYPE is text, grep processes a binary  file  as  if  it  were
              text; this is equivalent to the -a option.

              When  type  is  binary,  grep  may  treat non-text bytes as line
              terminators even without the -z  option.   This  means  choosing
              binary  versus text can affect whether a pattern matches a file.
              For example, when type is binary the pattern q$  might  match  q
              immediately  followed  by  a  null byte, even though this is not
              matched when type is text.  Conversely, when type is binary  the
              pattern . (period) might not match a null byte.

              Warning:  The  -a  option might output binary garbage, which can
              have nasty side effects if the output is a terminal and  if  the
              terminal driver interprets some of it as commands.  On the other
              hand, when reading files whose text encodings  are  unknown,  it
              can   be  helpful  to  use  -a  or  to  set  LC_ALL='C'  in  the
              environment, in order to find more matches even if  the  matches
              are unsafe for direct display.

thanks,

> 
> I note that the gnu/dammit grep in my devuan system (from 2018) also gets this
> wrong without -a, but gets it right with -a?
> 
> $ echo -e 'one\0two' | grep -l ^t
> (standard input)
> $ echo -e 'one\0two' | grep -al ^t
> $
> 
> Which is just extremely gnu. The gnu/dammit sed gets it right:
> 
> $ echo -e 'one\0two' | sed 's/^t/x/' | hd
> 00000000  6f 6e 65 00 74 77 6f 0a                           |one.two.|
> 00000008
> $ echo -e 'one\0two' | sed 's/t/x/' | hd
> 00000000  6f 6e 65 00 78 77 6f 0a                           |one.xwo.|
> 00000008
> 
> Rob
> _______________________________________________
> busybox mailing list
> [email protected]
> http://lists.busybox.net/mailman/listinfo/busybox

_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Re: Why is busybox grep matching ^SOL after NUL?

Reply via email to