On Mon, Apr 03, 2023 at 12:50:02PM -0700, David Christensen wrote: > On 4/3/23 11:47, Greg Wooledge wrote: > > Might be cleaner just to rewrite it from scratch. Especially since > > it mixes multiple invocations of perl together with (unsafe!) xargs and > > other shell commands.... > > > Please clarify "unsafe" and describe "safe" alternative(s).
The standard POSIX xargs command is completely unsuitable for use with filenames as input, for (at least) two reasons: 1) Despite popular belief, xargs does not split input into lines. It splits input into *words*, using any whitespace as delimiters. Therfore it fails if any of the input filenames contains whitespace. unicorn:~$ echo /stuff/music/Frank_Zappa/15\ The\ Return\ Of\ The\ Son\ Of\ Monster\ Magnet.mp3 | xargs ls -ld ls: cannot access '/stuff/music/Frank_Zappa/15': No such file or directory ls: cannot access 'The': No such file or directory ls: cannot access 'Return': No such file or directory ls: cannot access 'Of': No such file or directory ls: cannot access 'The': No such file or directory ls: cannot access 'Son': No such file or directory ls: cannot access 'Of': No such file or directory ls: cannot access 'Monster': No such file or directory ls: cannot access 'Magnet.mp3': No such file or directory 2) xargs actually uses quotes (single or double) in the input stream to delimit words, overriding the whitspace delimiters. So, a filename that contains a quoted section will be handled even more surprisingly: unicorn:~$ echo 'foo "bar b q" baz.txt' | xargs ls -ld ls: cannot access 'bar b q': No such file or directory ls: cannot access 'baz.txt': No such file or directory -rwxr-xr-x 1 greg greg 386 Apr 3 14:39 foo If the filename doesn't contain a balanced pair of quote marks, then it simply explodes. unicorn:~$ echo "You can't do that on television.mp4" | xargs ls -ld xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option ls: cannot access 'You': No such file or directory All together, any filename with whitespace OR a single quote OR a double quote will break POSIX xargs. There is no mitigation using the POSIX option set. None. At all. The command is unsuitable for general use. The GNU version of xargs, however, adds a -0 option: -0, --null Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally). Disables the end of file string, which is treated like any other argument. Useful when input items might contain white space, quote marks, or backslashes. The GNU find -print0 option produces input suitable for this mode. With this option, you can supply a stream of NUL-delimited filenames to xargs -0, and process them safely. No explosions will occur, no matter what filenames are passed. Feeding filenames to xargs -0 is usually done with either find -print0 (which is another GNU extension, also supported on modern BSD), or with something equivalent to printf '%s\0'. Other input sources are possible, but those are the big two.