Re: [PATCH] ls: add --files0-from=FILE option

Carl Edquist via GNU coreutils General Discussion Mon, 26 Apr 2021 18:10:52 -0700

On Wed, 21 Apr 2021, Carl Edquist wrote:

Thanks Pádraig for the thoughtful reply!


You bring up some good points, which for the sake of interesting
discussion i'd like to follow up on also.  (Maybe later this week...)

So to follow up - i don't have any action items here, just some detailsthat i thought might make for interesting discussion.



On Tue, 20 Apr 2021, p...@draigbrady.com wrote:

 In this case, xargs batches *13* separate invocations of ls; so the
 overall sorting is completely lost.

 But with the new option:

      [linux]$ find -name '*.[ch]' -print0 | ls -lrSh --files0-from=-

 The sizes all scroll in order


One can also implement this functionality with the DSU pattern like:

  nlargest=10
  find . -printf '%s\t%p\0' |
  sort -z -k1,1n | tail -z -n"$nlargest" | cut -z -f2 |
  xargs -r0 ls -lUd --color=auto --


I appreciate you taking the time to write out a full DSU example.

I was going to save this topic for a separate thread, but yeah one of myobservations over the years is i would see people make (i would sayoversimplified) comments that "you can do that with find", but they rarelygo through the effort to show it (and thus demonstrate just how much morecomplicated the equivalent find-based solution is to type out).


(More in this vein a bit later.)

Arguably that's more scalable as the sort operation will not
be restricted by available RAM, and will use temp storage.


Agreed!  This is a good point to keep in mind in general for ls.

On the one hand, ls already has file metadata information available to itin raw format (eg, size, timestamps), so there isn't the overhead toconvert relevant keys to sortable strings and write it all across pipesfor IPC.

But on the other hand, as you are saying, for the things that find(1) cangive sort(1) to sort (which is _most_ of the ls --sort modes), sort(1) cansort a bit more efficiently and can handle situations with very many filenames and very limited system memory.


But by the same argument,

    find . -mindepth 1 -maxdepth 1 ! -name .\* -printf '%f\0' |
    sort -z | xargs -r0 ls $LS_OPTIONS -dU --

is more scalable than a bare 'ls' invocation, since there can be literallymillions of entries in a single directory.


But 'ls' is certainly easier to type.

And if you want to sort on anything more interesting than the name itself,the DSU pipeline just gets more complex.

So at some point you can weigh scalability vs usability based on thesituation.

(In the linux tree example above, the "ls -lrSh --files0-from=-" run with45648 input source files has a maxrss of ~16M, so for instance this usecase is small enough for me not to worry about mem use.)

[By the same token, the sorting of shell globs in general (which in bashcan be expanded to an arbitrary number of arguments), can be more scalablydone outside of the shell (with find|sort) than in a shell glob itself.And while that is not a coreutils issue, the point is that despite this,(as with ls) shell globs can still be more convenient to use for sorting aset of files, perhaps in most cases, than an equivalent multi-toolpipeline.]

But doing the sorting in ls (rather than find|sort|cut|xargs) is not justeasier to type -- it's also easier to _get it right_ without a lot oftrial and error.


And for casual interactive use, that's kind of a feature, too.

For instance, in your DSU example, i spy a subtle bug in the undecoratestep: looks like it should be "cut -zf2-" instead of "cut -zf2" ...because after all, tabs are legal in filenames, too.

The DSU in pixelbeat's "newest" script (mentioned later) is also not freefrom filename handling issues [1].

(Point being - even for experienced users, writing a replacement for ls'sinternal sorting with an equivalent DSU pipeline can be tricky & timeconsuming to get right, and easy to get subtly wrong.)

Also there are some other subtle differences that might be worth keepingin mind:

- To get the same sort order as ls, you actually need "sort -rzk1,1n"instead of just "sort -zk1,1n", since for tie-breaking ls sorts thekeys and the names in opposite directions for ls -t and -S (whether or notyou pass -r to ls).

(Likewise if you want the same order as ls -S without the -r, you need aslightly different "sort -zk1,1nr".)

- when you put 'ls' after a pipe (as in "... | ls -lrSh --files0-from=-")you typically get the alias (eg, slackware has ls='/bin/ls $LS_OPTIONS').Meanwhile 'xargs ls' is whichever 'ls' is first in PATH, and does notinclude LS_OPTIONS. Not a big deal, but, buyer beware.

- lastly (and perhaps the only thing you can't do anything about), whenyou use 'xargs ls' rather than a single ls invocation, you lose theaggregate width alignments for any ls formats with column output (inthis case -l, but it's also true for -C and -x).

Also it's more robust as ls is the last step in the pipeline,
presenting the files to the user. If you wanted the largest 10
with ls --files0-from then file names containing a newline
would mess up further processing by tail etc.

So in the examples i gave, i actually had in mind to output the entirelisting to the terminal, so it can be scrolled through for review.

(That is, without filtering, with say 'tail'.)

And the biggest or newest files are displayed prominently at the end ofthe listing, which is what is visible when control is returned to theuser. From there i just copied off the final 5 entries.

I was actually very careful in my examples to avoid the question of doingany kind of processing of output from ls. But since you brought it up,for what it's worth i'll note a couple small thoughts:

- in my examples, i had in mind the default terminal output, which alreadyprints '?' for nongraphic characters. So, if i wanted (as in yourexample) just the last 10 items, but as a *post* processing step, i wouldfirst have to add the ls -q option to retain the '?' replacement fornon-tty output.

At that point, piping to 'tail' (without -z) would not be messed up at allby file names containing newlines, since they'd all be replaced with '?'s.So in that sense, robustness is actually _not_ affected.

- nevertheless, i do expect putting the 'tail' step before ls to be a bitmore efficient, since it avoids ls printing the long listing for filesthat won't make it into the final tail output. ('ls -lrShq | tail' printsthe long-format info even for lines that will be discarded.)

Similarly, say you would like to view / scroll through your extensive mp3
collection in chronological order (based on when you added the files to
your collection).  You can do it now with the new option:

    [music]$ find -name \*.mp3 -print0 | ls -lrth --files0-from=-


I've used a https://www.pixelbeat.org/scripts/newest script
for a long time with similar find|xargs technique as I described above.


Fun!

Although it looks like this script, despite using 'xargs -0', is actuallya good example of handling paths with newlines *incorrectly*:


[1] https://github.com/pixelb/scripts/blob/e337b59/scripts/newest#L68-L70

if [ ! -p /proc/self/fd/1 ]; then
   tr '\n' '\0' |
   xargs -r0 ls -lUd --color=auto --


*wince*  :)

In saying all of the above, I do agree though that for consistency
commands that need to process all arguments with global context,
should have a --files0-from option.
Currently that's du and wc for total counts, and sort(1) for sorting.
Since ls has sorting functionality, it should have this option too.

Yeah, it's not that ls is so desperately in need of this option; but forcompleteness, it has sometimes felt like a bit of a missing feature.



Thanks again for your consideration!

Carl

Re: [PATCH] ls: add --files0-from=FILE option

Reply via email to