On Tue, 18 Oct 2022 05:45:02 -0500
Rob Landley <[email protected]> wrote:
> $ echo -e 'one\0two' | busybox grep -l ^t
> (standard input)
/* BB_AUDIT GNU defects - always acts as -a. */
$ man grep | grep -A5 "^\s*-z,"
-z, --null-data
Treat input and output data as sequences of lines, each
terminated by a zero byte (the ASCII NUL character) instead of a
newline. Like the -Z or --null option, this option can be used
with commands like sort -z to process arbitrary file names.
$ echo -e "one\0two" | ./busybox grep -l ^t
$ echo -e "one\0two" | ./busybox grep -la ^t
$ echo -e "one\0two" | ./busybox grep -laz ^t
(standard input)
$ grep --version | head -n1
grep (GNU grep) 3.8
$ echo -e "one\0two" | grep -l ^t
(standard input)
$ echo -e "one\0two" | grep -la ^t
$ echo -e "one\0two" | grep -laz ^t
(standard input)
So... why does grep -l match while busybox grep -l does not?
It seems that GNU/the-fabulous grep defaults to --binary-files=binary:
$ echo -e "one\0two" | grep -l --binary-files=text ^t
$ echo -e "one\0two" | grep -l --binary-files=binary ^t
(standard input)
which is what we see above, i think.
>From the GNU/the-very-best grep:
--binary-files=TYPE
If a file's data or metadata indicate that the file contains
binary data, assume that the file is of type TYPE. Non-text
bytes indicate binary data; these are either output bytes that
are improperly encoded for the current locale, or null input
bytes when the -z option is not given.
By default, TYPE is binary, and grep suppresses output after
null input binary data is discovered, and suppresses output
lines that contain improperly encoded data. When some output is
suppressed, grep follows any output with a message to standard
error saying that a binary file matches.
If TYPE is without-match, when grep discovers null input binary
data it assumes that the rest of the file does not match; this
is equivalent to the -I option.
If TYPE is text, grep processes a binary file as if it were
text; this is equivalent to the -a option.
When type is binary, grep may treat non-text bytes as line
terminators even without the -z option. This means choosing
binary versus text can affect whether a pattern matches a file.
For example, when type is binary the pattern q$ might match q
immediately followed by a null byte, even though this is not
matched when type is text. Conversely, when type is binary the
pattern . (period) might not match a null byte.
Warning: The -a option might output binary garbage, which can
have nasty side effects if the output is a terminal and if the
terminal driver interprets some of it as commands. On the other
hand, when reading files whose text encodings are unknown, it
can be helpful to use -a or to set LC_ALL='C' in the
environment, in order to find more matches even if the matches
are unsafe for direct display.
thanks,
>
> I note that the gnu/dammit grep in my devuan system (from 2018) also gets this
> wrong without -a, but gets it right with -a?
>
> $ echo -e 'one\0two' | grep -l ^t
> (standard input)
> $ echo -e 'one\0two' | grep -al ^t
> $
>
> Which is just extremely gnu. The gnu/dammit sed gets it right:
>
> $ echo -e 'one\0two' | sed 's/^t/x/' | hd
> 00000000 6f 6e 65 00 74 77 6f 0a |one.two.|
> 00000008
> $ echo -e 'one\0two' | sed 's/t/x/' | hd
> 00000000 6f 6e 65 00 78 77 6f 0a |one.xwo.|
> 00000008
>
> Rob
> _______________________________________________
> busybox mailing list
> [email protected]
> http://lists.busybox.net/mailman/listinfo/busybox
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox