On 4/3/23 13:03, Greg Wooledge wrote:
On Mon, Apr 03, 2023 at 12:50:02PM -0700, David Christensen wrote:
On 4/3/23 11:47, Greg Wooledge wrote:
Might be cleaner just to rewrite it from scratch.  Especially since
it mixes multiple invocations of perl together with (unsafe!) xargs and
other shell commands....


Please clarify "unsafe" and describe "safe" alternative(s).

The standard POSIX xargs command is completely unsuitable for use
with filenames as input, for (at least) two reasons:

1) Despite popular belief, xargs does not split input into lines.  It
    splits input into *words*, using any whitespace as delimiters.
    Therfore it fails if any of the input filenames contains whitespace.

    unicorn:~$ echo /stuff/music/Frank_Zappa/15\ The\ Return\ Of\ The\ Son\ Of\ 
Monster\ Magnet.mp3 | xargs ls -ld
    ls: cannot access '/stuff/music/Frank_Zappa/15': No such file or directory
    ls: cannot access 'The': No such file or directory
    ls: cannot access 'Return': No such file or directory
    ls: cannot access 'Of': No such file or directory
    ls: cannot access 'The': No such file or directory
    ls: cannot access 'Son': No such file or directory
    ls: cannot access 'Of': No such file or directory
    ls: cannot access 'Monster': No such file or directory
    ls: cannot access 'Magnet.mp3': No such file or directory

2) xargs actually uses quotes (single or double) in the input stream to
    delimit words, overriding the whitspace delimiters.  So, a filename
    that contains a quoted section will be handled even more surprisingly:

    unicorn:~$ echo 'foo "bar b q" baz.txt' | xargs ls -ld
    ls: cannot access 'bar b q': No such file or directory
    ls: cannot access 'baz.txt': No such file or directory
    -rwxr-xr-x 1 greg greg 386 Apr  3 14:39  foo

    If the filename doesn't contain a balanced pair of quote marks, then
    it simply explodes.

    unicorn:~$ echo "You can't do that on television.mp4" | xargs ls -ld
    xargs: unmatched single quote; by default quotes are special to xargs 
unless you use the -0 option
    ls: cannot access 'You': No such file or directory

All together, any filename with whitespace OR a single quote OR a double
quote will break POSIX xargs.

There is no mitigation using the POSIX option set.  None.  At all.  The
command is unsuitable for general use.

The GNU version of xargs, however, adds a -0 option:

        -0, --null
               Input items are terminated by a null  character  instead  of  by
               whitespace,  and the quotes and backslash are not special (every
               character is taken literally).  Disables the end of file string,
               which  is  treated  like  any other argument.  Useful when input
               items might contain white space, quote  marks,  or  backslashes.
               The  GNU  find  -print0  option produces input suitable for this
               mode.

With this option, you can supply a stream of NUL-delimited filenames
to xargs -0, and process them safely.  No explosions will occur, no matter
what filenames are passed.

Feeding filenames to xargs -0 is usually done with either find -print0
(which is another GNU extension, also supported on modern BSD), or with
something equivalent to printf '%s\0'.  Other input sources are possible,
but those are the big two.


Yes, you are right.  Thank you.


The initial one-liner worked because every file in my PATH is a conventional Unix file name. I have upgraded the script to use NUL delimiters for input to xargs(1):


Dell Precision 3630:
1 @ Xeon E-2174G
2 @ 16 GB DDR-2666 ECC
1 @ Intel 520 Series SSD 60 GB


2023-04-03 13:17:42 root@taz ~
# cat /etc/debian_version; uname -a
11.6
Linux taz 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux


2023-04-03 13:17:46 root@taz ~
# cat /usr/local/bin/survey-path-file
#!/bin/sh
# $Id: survey-path-file,v 1.4 2023/04/03 20:09:47 dpchrist Exp $
# by David Paul Christensen dpchr...@holgerdanske.com
# Public Domain
#
# Run file(1) for files in PATH.  Count and print frequency of results.

echo $PATH \
| tr ':' '\n' \
| perl -MFile::Slurp -ne 'chomp;print map {"$_\0"} read_dir($_,prefix=>1)' \
| xargs -0 file \
| perl -pe 's/\S+\s+//' \
| grep -v 'symbolic link' \
| perl -pe 's/, dynamically linked.+//' \
| sort \
| uniq -c \
| sort -rn


2023-04-03 13:18:19 root@taz ~
# time survey-path-file
   1872 ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)
    359 POSIX shell script, ASCII text executable
    192 Perl script text executable
     40 Python script, ASCII text executable
     36 Bourne-Again shell script, ASCII text executable
     30 setuid ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)
     20 ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
     16 setgid ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)
     14 Tcl script, ASCII text executable
     10 ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux)
      8 POSIX shell script, UTF-8 Unicode text executable
      4 Python script, UTF-8 Unicode text executable
      4 POSIX shell script, ASCII text executable, with very long lines
      2 a /usr/bin/env sh script, ASCII text executable
      2 a /bin/mksh script, UTF-8 Unicode text executable
      2 Python script, ISO-8859 text executable
      2 Java source, UTF-8 Unicode text
      2 ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux)
2 Bourne-Again shell script, ASCII text executable, with very long lines

real    0m0.714s
user    0m0.623s
sys     0m0.131s


David

Reply via email to