On 4/3/23 13:03, Greg Wooledge wrote:
On Mon, Apr 03, 2023 at 12:50:02PM -0700, David Christensen wrote:
On 4/3/23 11:47, Greg Wooledge wrote:
Might be cleaner just to rewrite it from scratch. Especially since
it mixes multiple invocations of perl together with (unsafe!) xargs and
other shell commands....
Please clarify "unsafe" and describe "safe" alternative(s).
The standard POSIX xargs command is completely unsuitable for use
with filenames as input, for (at least) two reasons:
1) Despite popular belief, xargs does not split input into lines. It
splits input into *words*, using any whitespace as delimiters.
Therfore it fails if any of the input filenames contains whitespace.
unicorn:~$ echo /stuff/music/Frank_Zappa/15\ The\ Return\ Of\ The\ Son\ Of\
Monster\ Magnet.mp3 | xargs ls -ld
ls: cannot access '/stuff/music/Frank_Zappa/15': No such file or directory
ls: cannot access 'The': No such file or directory
ls: cannot access 'Return': No such file or directory
ls: cannot access 'Of': No such file or directory
ls: cannot access 'The': No such file or directory
ls: cannot access 'Son': No such file or directory
ls: cannot access 'Of': No such file or directory
ls: cannot access 'Monster': No such file or directory
ls: cannot access 'Magnet.mp3': No such file or directory
2) xargs actually uses quotes (single or double) in the input stream to
delimit words, overriding the whitspace delimiters. So, a filename
that contains a quoted section will be handled even more surprisingly:
unicorn:~$ echo 'foo "bar b q" baz.txt' | xargs ls -ld
ls: cannot access 'bar b q': No such file or directory
ls: cannot access 'baz.txt': No such file or directory
-rwxr-xr-x 1 greg greg 386 Apr 3 14:39 foo
If the filename doesn't contain a balanced pair of quote marks, then
it simply explodes.
unicorn:~$ echo "You can't do that on television.mp4" | xargs ls -ld
xargs: unmatched single quote; by default quotes are special to xargs
unless you use the -0 option
ls: cannot access 'You': No such file or directory
All together, any filename with whitespace OR a single quote OR a double
quote will break POSIX xargs.
There is no mitigation using the POSIX option set. None. At all. The
command is unsuitable for general use.
The GNU version of xargs, however, adds a -0 option:
-0, --null
Input items are terminated by a null character instead of by
whitespace, and the quotes and backslash are not special (every
character is taken literally). Disables the end of file string,
which is treated like any other argument. Useful when input
items might contain white space, quote marks, or backslashes.
The GNU find -print0 option produces input suitable for this
mode.
With this option, you can supply a stream of NUL-delimited filenames
to xargs -0, and process them safely. No explosions will occur, no matter
what filenames are passed.
Feeding filenames to xargs -0 is usually done with either find -print0
(which is another GNU extension, also supported on modern BSD), or with
something equivalent to printf '%s\0'. Other input sources are possible,
but those are the big two.
Yes, you are right. Thank you.
The initial one-liner worked because every file in my PATH is a
conventional Unix file name. I have upgraded the script to use NUL
delimiters for input to xargs(1):
Dell Precision 3630:
1 @ Xeon E-2174G
2 @ 16 GB DDR-2666 ECC
1 @ Intel 520 Series SSD 60 GB
2023-04-03 13:17:42 root@taz ~
# cat /etc/debian_version; uname -a
11.6
Linux taz 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64
GNU/Linux
2023-04-03 13:17:46 root@taz ~
# cat /usr/local/bin/survey-path-file
#!/bin/sh
# $Id: survey-path-file,v 1.4 2023/04/03 20:09:47 dpchrist Exp $
# by David Paul Christensen dpchr...@holgerdanske.com
# Public Domain
#
# Run file(1) for files in PATH. Count and print frequency of results.
echo $PATH \
| tr ':' '\n' \
| perl -MFile::Slurp -ne 'chomp;print map {"$_\0"} read_dir($_,prefix=>1)' \
| xargs -0 file \
| perl -pe 's/\S+\s+//' \
| grep -v 'symbolic link' \
| perl -pe 's/, dynamically linked.+//' \
| sort \
| uniq -c \
| sort -rn
2023-04-03 13:18:19 root@taz ~
# time survey-path-file
1872 ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)
359 POSIX shell script, ASCII text executable
192 Perl script text executable
40 Python script, ASCII text executable
36 Bourne-Again shell script, ASCII text executable
30 setuid ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)
20 ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
16 setgid ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)
14 Tcl script, ASCII text executable
10 ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux)
8 POSIX shell script, UTF-8 Unicode text executable
4 Python script, UTF-8 Unicode text executable
4 POSIX shell script, ASCII text executable, with very long lines
2 a /usr/bin/env sh script, ASCII text executable
2 a /bin/mksh script, UTF-8 Unicode text executable
2 Python script, ISO-8859 text executable
2 Java source, UTF-8 Unicode text
2 ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux)
2 Bourne-Again shell script, ASCII text executable, with very
long lines
real 0m0.714s
user 0m0.623s
sys 0m0.131s
David