On Mon, Apr 03, 2023 at 12:50:02PM -0700, David Christensen wrote:
> On 4/3/23 11:47, Greg Wooledge wrote:
> > Might be cleaner just to rewrite it from scratch.  Especially since
> > it mixes multiple invocations of perl together with (unsafe!) xargs and
> > other shell commands....
> 
> 
> Please clarify "unsafe" and describe "safe" alternative(s).

The standard POSIX xargs command is completely unsuitable for use
with filenames as input, for (at least) two reasons:

1) Despite popular belief, xargs does not split input into lines.  It
   splits input into *words*, using any whitespace as delimiters.
   Therfore it fails if any of the input filenames contains whitespace.

   unicorn:~$ echo /stuff/music/Frank_Zappa/15\ The\ Return\ Of\ The\ Son\ Of\ 
Monster\ Magnet.mp3 | xargs ls -ld
   ls: cannot access '/stuff/music/Frank_Zappa/15': No such file or directory
   ls: cannot access 'The': No such file or directory
   ls: cannot access 'Return': No such file or directory
   ls: cannot access 'Of': No such file or directory
   ls: cannot access 'The': No such file or directory
   ls: cannot access 'Son': No such file or directory
   ls: cannot access 'Of': No such file or directory
   ls: cannot access 'Monster': No such file or directory
   ls: cannot access 'Magnet.mp3': No such file or directory

2) xargs actually uses quotes (single or double) in the input stream to
   delimit words, overriding the whitspace delimiters.  So, a filename
   that contains a quoted section will be handled even more surprisingly:

   unicorn:~$ echo 'foo "bar b q" baz.txt' | xargs ls -ld
   ls: cannot access 'bar b q': No such file or directory
   ls: cannot access 'baz.txt': No such file or directory
   -rwxr-xr-x 1 greg greg 386 Apr  3 14:39  foo

   If the filename doesn't contain a balanced pair of quote marks, then
   it simply explodes.

   unicorn:~$ echo "You can't do that on television.mp4" | xargs ls -ld
   xargs: unmatched single quote; by default quotes are special to xargs unless 
you use the -0 option
   ls: cannot access 'You': No such file or directory

All together, any filename with whitespace OR a single quote OR a double
quote will break POSIX xargs.

There is no mitigation using the POSIX option set.  None.  At all.  The
command is unsuitable for general use.

The GNU version of xargs, however, adds a -0 option:

       -0, --null
              Input items are terminated by a null  character  instead  of  by
              whitespace,  and the quotes and backslash are not special (every
              character is taken literally).  Disables the end of file string,
              which  is  treated  like  any other argument.  Useful when input
              items might contain white space, quote  marks,  or  backslashes.
              The  GNU  find  -print0  option produces input suitable for this
              mode.

With this option, you can supply a stream of NUL-delimited filenames
to xargs -0, and process them safely.  No explosions will occur, no matter
what filenames are passed.

Feeding filenames to xargs -0 is usually done with either find -print0
(which is another GNU extension, also supported on modern BSD), or with
something equivalent to printf '%s\0'.  Other input sources are possible,
but those are the big two.

Reply via email to