Without commenting here about -magic/-mime, i.e. just to discuss the given statements on what is possible today.
On 5/25/23 21:18, anonymous wrote:
Currently - with find : We need xargs and sed and so have to worry about whitespace paths and filenames, we are also spawning several sub-commands. find -type f | xargs file | sed -n 's/:.*PE32 executable.*/p' | xargs my_command
With find(1), one does not have to "worry about whitespace". There are several safe ways to stay on the safe side: - executing per file (which may be inefficient): $ find ... -exec $TOOL '{}' ';' - bulk execution: $ find ... -exec $TOOL '{}' + - if $TOOL understands Zero-separated input (e.g. like grep): $ find ... -print0 | $TOOL -z - else $ find ... -print0 | xargs -r0 $TOOL Re. file(1): unfortunately, this tool - although it has a --files-from option - does not allow Zero-separated input. For the search case, it would also come handy if file(1) would have a --filter=PATTERN option, and furthermore allow to only print the file name matching the pattern for safe post-processing in other tools. Today, one could efficiently and safely use something like this to find files where file(1) returns a magic string matching PATTERN : $ find ... -exec file -00 '{}' + \ | sed -nz 'h;n; /PATTERN/{g;p}' \ | xargs -0 my_command Here's an example to filter on regular files smaller than 40000 bytes, then letting the "file ...|sed ..." pipe filter the wanted magic string "C source", and finally continue the search in a subsequent find(1) command. $ find -type f -size -40000c -mtime -1 -exec file -00 '{}' + \ | sed -nz 'h;n;/^C source/{g;p}' \ | find/find -files0-from - -ls Obviously, the file(1) run is always by far the most expensive part, because it has to read all the files, but at least it is only spawned as less as possible, which hence saves the number of times the magic file has to be loaded. Have a nice day, Berny