Dear Coreutils Maintainers,
I'd like to introduce my favorite 'ls' option, '-W', which I have been
enjoying using regularly over the last few years.
The concept is just to sort filenames by their printed widths.
(If this sounds odd, I invite you hear it out, try and see for yourself!)
I am including a patch with my implementation and accompanying tests - as
well as some sample output. And I'll happily field any requests for
improvements.
But first, some motivation....
The main use case for me has to do with managing filenames in directories,
as they are displayed by 'ls' itself.
There is a usual tidy/untidy cycle for me in my homedir, or various other
user-managed directories where files tend to accumulate.
The "tidying" part of the cycle involves organizing files into subdirs
until a bare 'ls' invocation fits comfortably within a window (eg 80x24).
(I feel that the 'ls' output is optimally useful when it fits into a
single window, to see and reason about the entire directory's contents at
once.)
Then over time, random files accumulate of various lengths; and before you
know it, the output of 'ls' is several window-heights tall. (And I feel
the usefulness of the 'ls' column output drops off significantly when you
can't see the entire listing in a single window.)
The _various lengths_ part is significant here, because a longer filename
makes the entire column it appears in wider. So if you have long
filenames mixed in with shorter ones, you end up with mostly whitespace in
the 'ls' output. (Which is also to say, filenames become inefficiently
"packed" in the column display.)
When this is sufficiently annoying to motivate tidying up again, the first
thing is actually to identify the long filenames, which are making a mess
of the otherwise-nice default 'ls' column output. Tucking just the long
ones into subdirs (or just renaming them to something shorter) is a quick
way to condense the directory listing output significantly.
Originally I would identify the longest filenames in a directory with
something in the shell like:
lsort0 () {
[ $# -eq 0 ] ||
printf '%s\0' "$@" |
awk '{print length, $0}' OFS='\t' {RS,ORS}='\0' |
sort -zn | cut -zf2-
}
zlines () { tr '\0\n' '\n?'; }
lsort0 * | tail -z | zlines
This does an ok job, but it seems like a lot of tricky work to accomplish,
when 'ls' is already designed for listing files sorted by various
criteria.
Also notably, the 'length' of the filename is not quite the right thing to
measure, as it does not take into account the width of unicode characters
(sometimes 0 or 2), nor (more generally) the actual width that gets used
when 'ls' displays it, which may include various quoting characters.
Really, only 'ls' itself has access to this information, so it can only be
done properly if the feature is built into 'ls'.
An interesting observation is that if you ask 'ls' to display files in the
order of their width, you actually get an optimally-packed column display,
in the default column format mode (-C).
This helps identify the outliers for long filenames, but it also looks
neat and can easily cut in half the number of lines 'ls' takes to display
a directory.
You can get a taste for this using the 'lsort0' function defined above,
with an unpatched 'ls':
lsort0 * | xargs -0 ls -dU --color=auto
(Try it in a messy homedir! Neat, eh?)
This emulates what the new 'ls -W' does by itself.
(I provide the complicated 'printf | awk | sort | cut | xargs' pipeline,
not to demonstrate that the new 'ls -W' option is superfluous, but to
show how troublesome it is even to approximate the desired result without
the option built in to 'ls'.)
Additionally, 'ls -W' can be combined naturally with other 'ls' options
like '-a' or '-r', or whatever decoration options you may have defined for
your 'ls' alias in LS_OPTIONS.
So, that's what the new ls -W/--sort=width option is all about.
It helps identify the outliers for long filenames, and it also produces a
more compact display of columns when listing a directory with many entries
of various widths.
An implementation detail: this sorts files based on ls's internal
'quote_name_width' using the current filename quoting options. So it
takes into account the actual width that 'ls' displays for each entry.
And ties are still broken by the default sorting of the filename itself -
as is the case with other sort options.
If you try it and you're impressed with how neatly 'ls -W' is able to pack
the filenames into columns, at first you might almost think it must be a
new 'ls --format' option; but really all it does is change the sort order.
That's about it. Thanks for your consideration, and I hope many find this
to be as useful & enjoyable as I do.
Carl
-=-=+=-=-
* Demo! *
[coreutils/src]$ ls # normal output
basename.c expand.c make-prime-list.c shred.c
basenc.c expr.c make-prime-list.o shuf.c
blake2 extent-scan.c md5sum.c single-binary.mk
cat.c extent-scan.h mkdir.c sleep.c
chcon.c extract-magic mkfifo.c sort.c
chgrp.c factor.c mknod.c split.c
chmod.c false.c mktemp.c stat.c
chown-core.c fiemap.h mv.c statx.h
chown-core.h find-mount-point.c nice.c stdbuf.c
chown.c find-mount-point.h nl.c stty.c
chroot.c fmt.c nohup.c sum.c
cksum.c fold.c nproc.c sync.c
comm.c force-link.c numfmt.c system.h
copy.c force-link.h od.c tac-pipe.c
copy.h fs-is-local.h operand2sig.c tac.c
coreutils-arch.c fs.h operand2sig.h tail.c
coreutils-dir.c getlimits.c paste.c tee.c
coreutils-vdir.c group-list.c pathchk.c test.c
coreutils.c group-list.h pinky.c timeout.c
coreutils.h groups.c pr.c touch.c
cp-hash.c head.c primes.h tr.c
cp-hash.h hostid.c printenv.c true.c
cp.c hostname.c printf.c truncate.c
csplit.c id.c prog-fprintf.c tsort.c
cu-progs.mk install.c prog-fprintf.h tty.c
cut.c ioblksize.h ptx.c uname-arch.c
date.c join.c pwd.c uname-uname.c
dcgen kill.c readlink.c uname.c
dd.c lbracket.c realpath.c uname.h
df.c libstdbuf.c relpath.c unexpand.c
die.h link.c relpath.h uniq.c
dircolors.c ln.c remove.c unlink.c
dircolors.h local.mk remove.h uptime.c
dircolors.hin logname.c rm.c users.c
dirname.c longlong.h rmdir.c version.c
du-tests ls-dir.c runcon.c version.h
du.c ls-ls.c selinux.c wc.c
echo.c ls-vdir.c selinux.h who.c
env.c ls.c seq.c whoami.c
expand-common.c ls.h set-fields.c yes.c
expand-common.h make-prime-list set-fields.h
[coreutils/src]$ ls -W # sort by width
cp.c seq.c sync.c tsort.c stdbuf.c readlink.c extent-scan.h
dd.c sum.c tail.c uname.c system.h realpath.c extract-magic
df.c tac.c test.c uname.h unlink.c tac-pipe.c fs-is-local.h
du.c tee.c true.c users.c uptime.c truncate.c operand2sig.c
fs.h tty.c uniq.c basenc.c whoami.c unexpand.c operand2sig.h
id.c who.c chcon.c chroot.c cp-hash.c coreutils.c uname-uname.c
ln.c yes.c chgrp.c csplit.c cp-hash.h coreutils.h prog-fprintf.c
ls.c blake2 chmod.c du-tests dirname.c cu-progs.mk prog-fprintf.h
ls.h comm.c chown.c expand.c install.c dircolors.c coreutils-dir.c
mv.c copy.c cksum.c factor.c logname.c dircolors.h expand-common.c
nl.c copy.h false.c fiemap.h ls-vdir.c getlimits.c expand-common.h
od.c date.c ls-ls.c groups.c pathchk.c ioblksize.h make-prime-list
pr.c echo.c mkdir.c hostid.c relpath.c libstdbuf.c coreutils-arch.c
rm.c expr.c mknod.c local.mk relpath.h chown-core.c coreutils-vdir.c
tr.c fold.c nohup.c ls-dir.c selinux.c chown-core.h single-binary.mk
wc.c head.c nproc.c md5sum.c selinux.h force-link.c make-prime-list.c
cat.c join.c paste.c mkfifo.c timeout.c force-link.h make-prime-list.o
cut.c kill.c pinky.c mktemp.c version.c group-list.c find-mount-point.c
dcgen link.c rmdir.c numfmt.c version.h group-list.h find-mount-point.h
die.h nice.c shred.c primes.h basename.c set-fields.c
env.c shuf.c sleep.c printf.c hostname.c set-fields.h
fmt.c sort.c split.c remove.c lbracket.c uname-arch.c
ptx.c stat.c statx.h remove.h longlong.h dircolors.hin
pwd.c stty.c touch.c runcon.c printenv.c extent-scan.c
[coreutils/src]$ # accumulate some long filenames...
[coreutils/src]$ touch {a,z}-some-obnoxiously-longish-filename
[coreutils/src]$ ls # normal output, now much taller
a-some-obnoxiously-longish-filename make-prime-list.c
basename.c make-prime-list.o
basenc.c md5sum.c
blake2 mkdir.c
cat.c mkfifo.c
chcon.c mknod.c
chgrp.c mktemp.c
chmod.c mv.c
chown-core.c nice.c
chown-core.h nl.c
chown.c nohup.c
chroot.c nproc.c
cksum.c numfmt.c
comm.c od.c
copy.c operand2sig.c
copy.h operand2sig.h
coreutils-arch.c paste.c
coreutils-dir.c pathchk.c
coreutils-vdir.c pinky.c
coreutils.c pr.c
coreutils.h primes.h
cp-hash.c printenv.c
cp-hash.h printf.c
cp.c prog-fprintf.c
csplit.c prog-fprintf.h
cu-progs.mk ptx.c
cut.c pwd.c
date.c readlink.c
dcgen realpath.c
dd.c relpath.c
df.c relpath.h
die.h remove.c
dircolors.c remove.h
dircolors.h rm.c
dircolors.hin rmdir.c
dirname.c runcon.c
du-tests selinux.c
du.c selinux.h
echo.c seq.c
env.c set-fields.c
expand-common.c set-fields.h
expand-common.h shred.c
expand.c shuf.c
expr.c single-binary.mk
extent-scan.c sleep.c
extent-scan.h sort.c
extract-magic split.c
factor.c stat.c
false.c statx.h
fiemap.h stdbuf.c
find-mount-point.c stty.c
find-mount-point.h sum.c
fmt.c sync.c
fold.c system.h
force-link.c tac-pipe.c
force-link.h tac.c
fs-is-local.h tail.c
fs.h tee.c
getlimits.c test.c
group-list.c timeout.c
group-list.h touch.c
groups.c tr.c
head.c true.c
hostid.c truncate.c
hostname.c tsort.c
id.c tty.c
install.c uname-arch.c
ioblksize.h uname-uname.c
join.c uname.c
kill.c uname.h
lbracket.c unexpand.c
libstdbuf.c uniq.c
link.c unlink.c
ln.c uptime.c
local.mk users.c
logname.c version.c
longlong.h version.h
ls-dir.c wc.c
ls-ls.c who.c
ls-vdir.c whoami.c
ls.c yes.c
ls.h z-some-obnoxiously-longish-filename
make-prime-list
[coreutils/src]$ ls -W # sort by width for much denser output
cp.c copy.c rmdir.c uptime.c libstdbuf.c
dd.c copy.h shred.c whoami.c chown-core.c
df.c date.c sleep.c cp-hash.c chown-core.h
du.c echo.c split.c cp-hash.h force-link.c
fs.h expr.c statx.h dirname.c force-link.h
id.c fold.c touch.c install.c group-list.c
ln.c head.c tsort.c logname.c group-list.h
ls.c join.c uname.c ls-vdir.c set-fields.c
ls.h kill.c uname.h pathchk.c set-fields.h
mv.c link.c users.c relpath.c uname-arch.c
nl.c nice.c basenc.c relpath.h dircolors.hin
od.c shuf.c chroot.c selinux.c extent-scan.c
pr.c sort.c csplit.c selinux.h extent-scan.h
rm.c stat.c du-tests timeout.c extract-magic
tr.c stty.c expand.c version.c fs-is-local.h
wc.c sync.c factor.c version.h operand2sig.c
cat.c tail.c fiemap.h basename.c operand2sig.h
cut.c test.c groups.c hostname.c uname-uname.c
dcgen true.c hostid.c lbracket.c prog-fprintf.c
die.h uniq.c local.mk longlong.h prog-fprintf.h
env.c chcon.c ls-dir.c printenv.c coreutils-dir.c
fmt.c chgrp.c md5sum.c readlink.c expand-common.c
ptx.c chmod.c mkfifo.c realpath.c expand-common.h
pwd.c chown.c mktemp.c tac-pipe.c make-prime-list
seq.c cksum.c numfmt.c truncate.c coreutils-arch.c
sum.c false.c primes.h unexpand.c coreutils-vdir.c
tac.c ls-ls.c printf.c coreutils.c single-binary.mk
tee.c mkdir.c remove.c coreutils.h make-prime-list.c
tty.c mknod.c remove.h cu-progs.mk make-prime-list.o
who.c nohup.c runcon.c dircolors.c find-mount-point.c
yes.c nproc.c stdbuf.c dircolors.h find-mount-point.h
blake2 paste.c system.h getlimits.c a-some-obnoxiously-longish-filename
comm.c pinky.c unlink.c ioblksize.h z-some-obnoxiously-longish-filename
From dc7cd08682a7618e1bb2ef9764960e39de14237f Mon Sep 17 00:00:00 2001
From: Carl Edquist <edqu...@cs.wisc.edu>
Date: Fri, 26 Mar 2021 04:27:54 -0500
Subject: [PATCH] ls: add --sort=width (-W) option to sort by filename width
This helps identify the outliers for long filenames, and also produces
a more compact display of columns when listing a directory with many
entries of various widths.
* src/ls.c (sort_type, sort_types, sort_width): New sort_width sort
type.
(sort_args): Add "width" sort arg.
(decode_switches): Parse '-W' option.
(cmp_width, fileinfo_width): New sort function and helper for filename
width.
(quote_name_width): Add function prototype declaration.
(usage): Document -W/--sort=width option.
* doc/coreutils.texi: Document -W/--sort=width option.
* tests/local.mk: Add new test.
* tests/ls/sort-width_W-option.sh: Exercise --sort=width and -W options.
* NEWS: Mention the new feature.
---
NEWS | 2 ++
doc/coreutils.texi | 7 ++++++
src/ls.c | 36 ++++++++++++++++++++++++---
tests/local.mk | 1 +
tests/ls/sort-width_W-option.sh | 43 +++++++++++++++++++++++++++++++++
5 files changed, 85 insertions(+), 4 deletions(-)
create mode 100755 tests/ls/sort-width_W-option.sh
diff --git a/NEWS b/NEWS
index 802f4b427..4ba164e85 100644
--- a/NEWS
+++ b/NEWS
@@ -70,6 +70,8 @@ GNU coreutils NEWS -*- outline -*-
ls --classify now supports the "always", "auto", or "never" flags,
to support only outputting classifier characters if connected to a tty.
+ ls now accepts the --sort=width (-W) option, to sort by filename width.
+
nl --line-increment can now take a negative number to decrement the count.
** Improvements
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 06ecdd74c..0c7bb8d44 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -7939,6 +7939,13 @@ Sort by version name and number, lowest first. It behaves like a default
sort, except that each sequence of decimal digits is treated numerically
as an index/version number. (@xref{Version sort ordering}.)
+@item -W
+@itemx --sort=width
+@opindex -W
+@opindex --sort
+@opindex width@r{, sorting option for @command{ls}}
+Sort by printed width of filenames.
+
@item -X
@itemx --sort=extension
@opindex -X
diff --git a/src/ls.c b/src/ls.c
index 2d0450e54..12ea550e3 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -307,6 +307,10 @@ static void parse_ls_color (void);
static void getenv_quoting_style (void);
+static size_t quote_name_width (const char *name,
+ struct quoting_options const *options,
+ int needs_general_quoting);
+
/* Initial size of hash table.
Most hierarchies are likely to be shallower than this. */
#define INITIAL_TABLE_SIZE 30
@@ -475,6 +479,7 @@ enum sort_type
sort_none = -1, /* -U */
sort_name, /* default */
sort_extension, /* -X */
+ sort_width, /* -W */
sort_size, /* -S */
sort_version, /* -v */
sort_time, /* -t */
@@ -903,11 +908,11 @@ ARGMATCH_VERIFY (format_args, format_types);
static char const *const sort_args[] =
{
- "none", "time", "size", "extension", "version", NULL
+ "none", "time", "size", "extension", "version", "width", NULL
};
static enum sort_type const sort_types[] =
{
- sort_none, sort_time, sort_size, sort_extension, sort_version
+ sort_none, sort_time, sort_size, sort_extension, sort_version, sort_width
};
ARGMATCH_VERIFY (sort_args, sort_types);
@@ -1958,7 +1963,7 @@ decode_switches (int argc, char **argv)
{
int oi = -1;
int c = getopt_long (argc, argv,
- "abcdfghiklmnopqrstuvw:xABCDFGHI:LNQRST:UXZ1",
+ "abcdfghiklmnopqrstuvw:xABCDFGHI:LNQRST:UWXZ1",
long_options, &oi);
if (c == -1)
break;
@@ -2155,6 +2160,11 @@ decode_switches (int argc, char **argv)
sort_type_specified = true;
break;
+ case 'W':
+ sort_type = sort_width;
+ sort_type_specified = true;
+ break;
+
case 'X':
sort_type = sort_extension;
sort_type_specified = true;
@@ -3877,6 +3887,20 @@ cmp_extension (struct fileinfo const *a, struct fileinfo const *b,
return diff ? diff : cmp (a->name, b->name);
}
+static inline size_t
+fileinfo_width (struct fileinfo const *f)
+{
+ return quote_name_width (f->name, filename_quoting_options, f->quoted);
+}
+
+static inline int
+cmp_width (struct fileinfo const *a, struct fileinfo const *b,
+ int (*cmp) (char const *, char const *))
+{
+ int diff = fileinfo_width (a) - fileinfo_width (b);
+ return diff ? diff : cmp (a->name, b->name);
+}
+
DEFINE_SORT_FUNCTIONS (ctime, cmp_ctime)
DEFINE_SORT_FUNCTIONS (mtime, cmp_mtime)
DEFINE_SORT_FUNCTIONS (atime, cmp_atime)
@@ -3884,6 +3908,7 @@ DEFINE_SORT_FUNCTIONS (btime, cmp_btime)
DEFINE_SORT_FUNCTIONS (size, cmp_size)
DEFINE_SORT_FUNCTIONS (name, cmp_name)
DEFINE_SORT_FUNCTIONS (extension, cmp_extension)
+DEFINE_SORT_FUNCTIONS (width, cmp_width)
/* Compare file versions.
Unlike all other compare functions above, cmp_version depends only
@@ -3936,6 +3961,7 @@ static qsortFunc const sort_functions[][2][2][2] =
{
LIST_SORTFUNCTION_VARIANTS (name),
LIST_SORTFUNCTION_VARIANTS (extension),
+ LIST_SORTFUNCTION_VARIANTS (width),
LIST_SORTFUNCTION_VARIANTS (size),
{
@@ -5454,7 +5480,8 @@ Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.\n\
-S sort by file size, largest first\n\
--sort=WORD sort by WORD instead of name: none (-U), size (-S)\
,\n\
- time (-t), version (-v), extension (-X)\n\
+ time (-t), version (-v), extension (-X),\n\
+ width (-W)\n\
--time=WORD change the default of using modification times;\n\
access time (-u): atime, access, use;\n\
change time (-c): ctime, status;\n\
@@ -5478,6 +5505,7 @@ Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.\n\
"), stdout);
fputs (_("\
-w, --width=COLS set output width to COLS. 0 means no limit\n\
+ -W sort by entry name width\n\
-x list entries by lines instead of by columns\n\
-X sort alphabetically by entry extension\n\
-Z, --context print any security context of each file\n\
diff --git a/tests/local.mk b/tests/local.mk
index 27e31ec8e..41981ffd6 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -632,6 +632,7 @@ all_tests = \
tests/ls/symlink-quote.sh \
tests/ls/symlink-slash.sh \
tests/ls/time-style-diag.sh \
+ tests/ls/sort-width_W-option.sh \
tests/ls/x-option.sh \
tests/ls/hyperlink.sh \
tests/mkdir/p-1.sh \
diff --git a/tests/ls/sort-width_W-option.sh b/tests/ls/sort-width_W-option.sh
new file mode 100755
index 000000000..5eb7afd18
--- /dev/null
+++ b/tests/ls/sort-width_W-option.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+# Exercise the -W/--sort=width option.
+
+# Copyright (C) 2007-2021 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
+print_ver_ ls
+
+mkdir subdir || framework_failure_
+touch subdir/aaaaa || framework_failure_
+touch subdir/bbb || framework_failure_
+touch subdir/cccc || framework_failure_
+touch subdir/d || framework_failure_
+touch subdir/zz || framework_failure_
+
+
+ls -W subdir > out1 || fail=1
+ls --sort=width subdir > out2 || fail=1
+cat <<\EOF > exp || framework_failure_
+d
+zz
+bbb
+cccc
+aaaaa
+EOF
+
+compare exp out1 || fail=1
+compare exp out2 || fail=1
+
+Exit $fail
--
2.17.1