On Wed, 21 Apr 2021, Carl Edquist wrote:
Thanks Pádraig for the thoughtful reply!
You bring up some good points, which for the sake of interesting
discussion i'd like to follow up on also. (Maybe later this week...)
So to follow up - i don't have any action items here, just some details
that i thought might make for interesting discussion.
On Tue, 20 Apr 2021, p...@draigbrady.com wrote:
In this case, xargs batches *13* separate invocations of ls; so the
overall sorting is completely lost.
But with the new option:
[linux]$ find -name '*.[ch]' -print0 | ls -lrSh --files0-from=-
The sizes all scroll in order
One can also implement this functionality with the DSU pattern like:
nlargest=10
find . -printf '%s\t%p\0' |
sort -z -k1,1n | tail -z -n"$nlargest" | cut -z -f2 |
xargs -r0 ls -lUd --color=auto --
I appreciate you taking the time to write out a full DSU example.
I was going to save this topic for a separate thread, but yeah one of my
observations over the years is i would see people make (i would say
oversimplified) comments that "you can do that with find", but they rarely
go through the effort to show it (and thus demonstrate just how much more
complicated the equivalent find-based solution is to type out).
(More in this vein a bit later.)
Arguably that's more scalable as the sort operation will not
be restricted by available RAM, and will use temp storage.
Agreed! This is a good point to keep in mind in general for ls.
On the one hand, ls already has file metadata information available to it
in raw format (eg, size, timestamps), so there isn't the overhead to
convert relevant keys to sortable strings and write it all across pipes
for IPC.
But on the other hand, as you are saying, for the things that find(1) can
give sort(1) to sort (which is _most_ of the ls --sort modes), sort(1) can
sort a bit more efficiently and can handle situations with very many file
names and very limited system memory.
But by the same argument,
find . -mindepth 1 -maxdepth 1 ! -name .\* -printf '%f\0' |
sort -z | xargs -r0 ls $LS_OPTIONS -dU --
is more scalable than a bare 'ls' invocation, since there can be literally
millions of entries in a single directory.
But 'ls' is certainly easier to type.
And if you want to sort on anything more interesting than the name itself,
the DSU pipeline just gets more complex.
So at some point you can weigh scalability vs usability based on the
situation.
(In the linux tree example above, the "ls -lrSh --files0-from=-" run with
45648 input source files has a maxrss of ~16M, so for instance this use
case is small enough for me not to worry about mem use.)
[By the same token, the sorting of shell globs in general (which in bash
can be expanded to an arbitrary number of arguments), can be more scalably
done outside of the shell (with find|sort) than in a shell glob itself.
And while that is not a coreutils issue, the point is that despite this,
(as with ls) shell globs can still be more convenient to use for sorting a
set of files, perhaps in most cases, than an equivalent multi-tool
pipeline.]
But doing the sorting in ls (rather than find|sort|cut|xargs) is not just
easier to type -- it's also easier to _get it right_ without a lot of
trial and error.
And for casual interactive use, that's kind of a feature, too.
For instance, in your DSU example, i spy a subtle bug in the undecorate
step: looks like it should be "cut -zf2-" instead of "cut -zf2" ...
because after all, tabs are legal in filenames, too.
The DSU in pixelbeat's "newest" script (mentioned later) is also not free
from filename handling issues [1].
(Point being - even for experienced users, writing a replacement for ls's
internal sorting with an equivalent DSU pipeline can be tricky & time
consuming to get right, and easy to get subtly wrong.)
Also there are some other subtle differences that might be worth keeping
in mind:
- To get the same sort order as ls, you actually need "sort -rzk1,1n"
instead of just "sort -zk1,1n", since for tie-breaking ls sorts the
keys and the names in opposite directions for ls -t and -S (whether or not
you pass -r to ls).
(Likewise if you want the same order as ls -S without the -r, you need a
slightly different "sort -zk1,1nr".)
- when you put 'ls' after a pipe (as in "... | ls -lrSh --files0-from=-")
you typically get the alias (eg, slackware has ls='/bin/ls $LS_OPTIONS').
Meanwhile 'xargs ls' is whichever 'ls' is first in PATH, and does not
include LS_OPTIONS. Not a big deal, but, buyer beware.
- lastly (and perhaps the only thing you can't do anything about), when
you use 'xargs ls' rather than a single ls invocation, you lose the
aggregate width alignments for any ls formats with column output (in
this case -l, but it's also true for -C and -x).
Also it's more robust as ls is the last step in the pipeline,
presenting the files to the user. If you wanted the largest 10
with ls --files0-from then file names containing a newline
would mess up further processing by tail etc.
So in the examples i gave, i actually had in mind to output the entire
listing to the terminal, so it can be scrolled through for review.
(That is, without filtering, with say 'tail'.)
And the biggest or newest files are displayed prominently at the end of
the listing, which is what is visible when control is returned to the
user. From there i just copied off the final 5 entries.
I was actually very careful in my examples to avoid the question of doing
any kind of processing of output from ls. But since you brought it up,
for what it's worth i'll note a couple small thoughts:
- in my examples, i had in mind the default terminal output, which already
prints '?' for nongraphic characters. So, if i wanted (as in your
example) just the last 10 items, but as a *post* processing step, i would
first have to add the ls -q option to retain the '?' replacement for
non-tty output.
At that point, piping to 'tail' (without -z) would not be messed up at all
by file names containing newlines, since they'd all be replaced with '?'s.
So in that sense, robustness is actually _not_ affected.
- nevertheless, i do expect putting the 'tail' step before ls to be a bit
more efficient, since it avoids ls printing the long listing for files
that won't make it into the final tail output. ('ls -lrShq | tail' prints
the long-format info even for lines that will be discarded.)
Similarly, say you would like to view / scroll through your extensive mp3
collection in chronological order (based on when you added the files to
your collection). You can do it now with the new option:
[music]$ find -name \*.mp3 -print0 | ls -lrth --files0-from=-
I've used a https://www.pixelbeat.org/scripts/newest script
for a long time with similar find|xargs technique as I described above.
Fun!
Although it looks like this script, despite using 'xargs -0', is actually
a good example of handling paths with newlines *incorrectly*:
[1] https://github.com/pixelb/scripts/blob/e337b59/scripts/newest#L68-L70
if [ ! -p /proc/self/fd/1 ]; then
tr '\n' '\0' |
xargs -r0 ls -lUd --color=auto --
*wince* :)
In saying all of the above, I do agree though that for consistency
commands that need to process all arguments with global context,
should have a --files0-from option.
Currently that's du and wc for total counts, and sort(1) for sorting.
Since ls has sorting functionality, it should have this option too.
Yeah, it's not that ls is so desperately in need of this option; but for
completeness, it has sometimes felt like a bit of a missing feature.
Thanks again for your consideration!
Carl