Dear Coreutils Maintainers,

I'd like to introduce my favorite 'ls' option, '-W', which I have been enjoying using regularly over the last few years.

The concept is just to sort filenames by their printed widths.


(If this sounds odd, I invite you hear it out, try and see for yourself!)


I am including a patch with my implementation and accompanying tests - as well as some sample output. And I'll happily field any requests for improvements.


But first, some motivation....


The main use case for me has to do with managing filenames in directories, as they are displayed by 'ls' itself.

There is a usual tidy/untidy cycle for me in my homedir, or various other user-managed directories where files tend to accumulate.

The "tidying" part of the cycle involves organizing files into subdirs until a bare 'ls' invocation fits comfortably within a window (eg 80x24).

(I feel that the 'ls' output is optimally useful when it fits into a single window, to see and reason about the entire directory's contents at once.)

Then over time, random files accumulate of various lengths; and before you know it, the output of 'ls' is several window-heights tall. (And I feel the usefulness of the 'ls' column output drops off significantly when you can't see the entire listing in a single window.)

The _various lengths_ part is significant here, because a longer filename makes the entire column it appears in wider. So if you have long filenames mixed in with shorter ones, you end up with mostly whitespace in the 'ls' output. (Which is also to say, filenames become inefficiently "packed" in the column display.)

When this is sufficiently annoying to motivate tidying up again, the first thing is actually to identify the long filenames, which are making a mess of the otherwise-nice default 'ls' column output. Tucking just the long ones into subdirs (or just renaming them to something shorter) is a quick way to condense the directory listing output significantly.


Originally I would identify the longest filenames in a directory with something in the shell like:


    lsort0 () {
        [ $# -eq 0 ] ||
        printf '%s\0' "$@" |
        awk '{print length, $0}' OFS='\t' {RS,ORS}='\0' |
        sort -zn | cut -zf2-
    }

    zlines () { tr '\0\n' '\n?'; }

    lsort0 * | tail -z | zlines


This does an ok job, but it seems like a lot of tricky work to accomplish, when 'ls' is already designed for listing files sorted by various criteria.

Also notably, the 'length' of the filename is not quite the right thing to measure, as it does not take into account the width of unicode characters (sometimes 0 or 2), nor (more generally) the actual width that gets used when 'ls' displays it, which may include various quoting characters.

Really, only 'ls' itself has access to this information, so it can only be done properly if the feature is built into 'ls'.


An interesting observation is that if you ask 'ls' to display files in the order of their width, you actually get an optimally-packed column display, in the default column format mode (-C).

This helps identify the outliers for long filenames, but it also looks neat and can easily cut in half the number of lines 'ls' takes to display a directory.

You can get a taste for this using the 'lsort0' function defined above, with an unpatched 'ls':

    lsort0 * | xargs -0 ls -dU --color=auto

(Try it in a messy homedir!  Neat, eh?)

This emulates what the new 'ls -W' does by itself.


(I provide the complicated 'printf | awk | sort | cut | xargs' pipeline, not to demonstrate that the new 'ls -W' option is superfluous, but to show how troublesome it is even to approximate the desired result without the option built in to 'ls'.)

Additionally, 'ls -W' can be combined naturally with other 'ls' options like '-a' or '-r', or whatever decoration options you may have defined for your 'ls' alias in LS_OPTIONS.


So, that's what the new ls -W/--sort=width option is all about.

It helps identify the outliers for long filenames, and it also produces a more compact display of columns when listing a directory with many entries of various widths.


An implementation detail: this sorts files based on ls's internal 'quote_name_width' using the current filename quoting options. So it takes into account the actual width that 'ls' displays for each entry.

And ties are still broken by the default sorting of the filename itself - as is the case with other sort options.


If you try it and you're impressed with how neatly 'ls -W' is able to pack the filenames into columns, at first you might almost think it must be a new 'ls --format' option; but really all it does is change the sort order.



That's about it. Thanks for your consideration, and I hope many find this to be as useful & enjoyable as I do.


Carl


-=-=+=-=-


* Demo! *


[coreutils/src]$ ls  # normal output
basename.c        expand.c            make-prime-list.c  shred.c
basenc.c          expr.c              make-prime-list.o  shuf.c
blake2            extent-scan.c       md5sum.c           single-binary.mk
cat.c             extent-scan.h       mkdir.c            sleep.c
chcon.c           extract-magic       mkfifo.c           sort.c
chgrp.c           factor.c            mknod.c            split.c
chmod.c           false.c             mktemp.c           stat.c
chown-core.c      fiemap.h            mv.c               statx.h
chown-core.h      find-mount-point.c  nice.c             stdbuf.c
chown.c           find-mount-point.h  nl.c               stty.c
chroot.c          fmt.c               nohup.c            sum.c
cksum.c           fold.c              nproc.c            sync.c
comm.c            force-link.c        numfmt.c           system.h
copy.c            force-link.h        od.c               tac-pipe.c
copy.h            fs-is-local.h       operand2sig.c      tac.c
coreutils-arch.c  fs.h                operand2sig.h      tail.c
coreutils-dir.c   getlimits.c         paste.c            tee.c
coreutils-vdir.c  group-list.c        pathchk.c          test.c
coreutils.c       group-list.h        pinky.c            timeout.c
coreutils.h       groups.c            pr.c               touch.c
cp-hash.c         head.c              primes.h           tr.c
cp-hash.h         hostid.c            printenv.c         true.c
cp.c              hostname.c          printf.c           truncate.c
csplit.c          id.c                prog-fprintf.c     tsort.c
cu-progs.mk       install.c           prog-fprintf.h     tty.c
cut.c             ioblksize.h         ptx.c              uname-arch.c
date.c            join.c              pwd.c              uname-uname.c
dcgen             kill.c              readlink.c         uname.c
dd.c              lbracket.c          realpath.c         uname.h
df.c              libstdbuf.c         relpath.c          unexpand.c
die.h             link.c              relpath.h          uniq.c
dircolors.c       ln.c                remove.c           unlink.c
dircolors.h       local.mk            remove.h           uptime.c
dircolors.hin     logname.c           rm.c               users.c
dirname.c         longlong.h          rmdir.c            version.c
du-tests          ls-dir.c            runcon.c           version.h
du.c              ls-ls.c             selinux.c          wc.c
echo.c            ls-vdir.c           selinux.h          who.c
env.c             ls.c                seq.c              whoami.c
expand-common.c   ls.h                set-fields.c       yes.c
expand-common.h   make-prime-list     set-fields.h


[coreutils/src]$ ls -W  # sort by width
cp.c   seq.c   sync.c   tsort.c   stdbuf.c    readlink.c     extent-scan.h
dd.c   sum.c   tail.c   uname.c   system.h    realpath.c     extract-magic
df.c   tac.c   test.c   uname.h   unlink.c    tac-pipe.c     fs-is-local.h
du.c   tee.c   true.c   users.c   uptime.c    truncate.c     operand2sig.c
fs.h   tty.c   uniq.c   basenc.c  whoami.c    unexpand.c     operand2sig.h
id.c   who.c   chcon.c  chroot.c  cp-hash.c   coreutils.c    uname-uname.c
ln.c   yes.c   chgrp.c  csplit.c  cp-hash.h   coreutils.h    prog-fprintf.c
ls.c   blake2  chmod.c  du-tests  dirname.c   cu-progs.mk    prog-fprintf.h
ls.h   comm.c  chown.c  expand.c  install.c   dircolors.c    coreutils-dir.c
mv.c   copy.c  cksum.c  factor.c  logname.c   dircolors.h    expand-common.c
nl.c   copy.h  false.c  fiemap.h  ls-vdir.c   getlimits.c    expand-common.h
od.c   date.c  ls-ls.c  groups.c  pathchk.c   ioblksize.h    make-prime-list
pr.c   echo.c  mkdir.c  hostid.c  relpath.c   libstdbuf.c    coreutils-arch.c
rm.c   expr.c  mknod.c  local.mk  relpath.h   chown-core.c   coreutils-vdir.c
tr.c   fold.c  nohup.c  ls-dir.c  selinux.c   chown-core.h   single-binary.mk
wc.c   head.c  nproc.c  md5sum.c  selinux.h   force-link.c   make-prime-list.c
cat.c  join.c  paste.c  mkfifo.c  timeout.c   force-link.h   make-prime-list.o
cut.c  kill.c  pinky.c  mktemp.c  version.c   group-list.c   find-mount-point.c
dcgen  link.c  rmdir.c  numfmt.c  version.h   group-list.h   find-mount-point.h
die.h  nice.c  shred.c  primes.h  basename.c  set-fields.c
env.c  shuf.c  sleep.c  printf.c  hostname.c  set-fields.h
fmt.c  sort.c  split.c  remove.c  lbracket.c  uname-arch.c
ptx.c  stat.c  statx.h  remove.h  longlong.h  dircolors.hin
pwd.c  stty.c  touch.c  runcon.c  printenv.c  extent-scan.c


[coreutils/src]$ # accumulate some long filenames...
[coreutils/src]$ touch {a,z}-some-obnoxiously-longish-filename

[coreutils/src]$ ls  # normal output, now much taller
a-some-obnoxiously-longish-filename  make-prime-list.c
basename.c                           make-prime-list.o
basenc.c                             md5sum.c
blake2                               mkdir.c
cat.c                                mkfifo.c
chcon.c                              mknod.c
chgrp.c                              mktemp.c
chmod.c                              mv.c
chown-core.c                         nice.c
chown-core.h                         nl.c
chown.c                              nohup.c
chroot.c                             nproc.c
cksum.c                              numfmt.c
comm.c                               od.c
copy.c                               operand2sig.c
copy.h                               operand2sig.h
coreutils-arch.c                     paste.c
coreutils-dir.c                      pathchk.c
coreutils-vdir.c                     pinky.c
coreutils.c                          pr.c
coreutils.h                          primes.h
cp-hash.c                            printenv.c
cp-hash.h                            printf.c
cp.c                                 prog-fprintf.c
csplit.c                             prog-fprintf.h
cu-progs.mk                          ptx.c
cut.c                                pwd.c
date.c                               readlink.c
dcgen                                realpath.c
dd.c                                 relpath.c
df.c                                 relpath.h
die.h                                remove.c
dircolors.c                          remove.h
dircolors.h                          rm.c
dircolors.hin                        rmdir.c
dirname.c                            runcon.c
du-tests                             selinux.c
du.c                                 selinux.h
echo.c                               seq.c
env.c                                set-fields.c
expand-common.c                      set-fields.h
expand-common.h                      shred.c
expand.c                             shuf.c
expr.c                               single-binary.mk
extent-scan.c                        sleep.c
extent-scan.h                        sort.c
extract-magic                        split.c
factor.c                             stat.c
false.c                              statx.h
fiemap.h                             stdbuf.c
find-mount-point.c                   stty.c
find-mount-point.h                   sum.c
fmt.c                                sync.c
fold.c                               system.h
force-link.c                         tac-pipe.c
force-link.h                         tac.c
fs-is-local.h                        tail.c
fs.h                                 tee.c
getlimits.c                          test.c
group-list.c                         timeout.c
group-list.h                         touch.c
groups.c                             tr.c
head.c                               true.c
hostid.c                             truncate.c
hostname.c                           tsort.c
id.c                                 tty.c
install.c                            uname-arch.c
ioblksize.h                          uname-uname.c
join.c                               uname.c
kill.c                               uname.h
lbracket.c                           unexpand.c
libstdbuf.c                          uniq.c
link.c                               unlink.c
ln.c                                 uptime.c
local.mk                             users.c
logname.c                            version.c
longlong.h                           version.h
ls-dir.c                             wc.c
ls-ls.c                              who.c
ls-vdir.c                            whoami.c
ls.c                                 yes.c
ls.h                                 z-some-obnoxiously-longish-filename
make-prime-list


[coreutils/src]$ ls -W  # sort by width for much denser output
cp.c    copy.c   rmdir.c   uptime.c     libstdbuf.c
dd.c    copy.h   shred.c   whoami.c     chown-core.c
df.c    date.c   sleep.c   cp-hash.c    chown-core.h
du.c    echo.c   split.c   cp-hash.h    force-link.c
fs.h    expr.c   statx.h   dirname.c    force-link.h
id.c    fold.c   touch.c   install.c    group-list.c
ln.c    head.c   tsort.c   logname.c    group-list.h
ls.c    join.c   uname.c   ls-vdir.c    set-fields.c
ls.h    kill.c   uname.h   pathchk.c    set-fields.h
mv.c    link.c   users.c   relpath.c    uname-arch.c
nl.c    nice.c   basenc.c  relpath.h    dircolors.hin
od.c    shuf.c   chroot.c  selinux.c    extent-scan.c
pr.c    sort.c   csplit.c  selinux.h    extent-scan.h
rm.c    stat.c   du-tests  timeout.c    extract-magic
tr.c    stty.c   expand.c  version.c    fs-is-local.h
wc.c    sync.c   factor.c  version.h    operand2sig.c
cat.c   tail.c   fiemap.h  basename.c   operand2sig.h
cut.c   test.c   groups.c  hostname.c   uname-uname.c
dcgen   true.c   hostid.c  lbracket.c   prog-fprintf.c
die.h   uniq.c   local.mk  longlong.h   prog-fprintf.h
env.c   chcon.c  ls-dir.c  printenv.c   coreutils-dir.c
fmt.c   chgrp.c  md5sum.c  readlink.c   expand-common.c
ptx.c   chmod.c  mkfifo.c  realpath.c   expand-common.h
pwd.c   chown.c  mktemp.c  tac-pipe.c   make-prime-list
seq.c   cksum.c  numfmt.c  truncate.c   coreutils-arch.c
sum.c   false.c  primes.h  unexpand.c   coreutils-vdir.c
tac.c   ls-ls.c  printf.c  coreutils.c  single-binary.mk
tee.c   mkdir.c  remove.c  coreutils.h  make-prime-list.c
tty.c   mknod.c  remove.h  cu-progs.mk  make-prime-list.o
who.c   nohup.c  runcon.c  dircolors.c  find-mount-point.c
yes.c   nproc.c  stdbuf.c  dircolors.h  find-mount-point.h
blake2  paste.c  system.h  getlimits.c  a-some-obnoxiously-longish-filename
comm.c  pinky.c  unlink.c  ioblksize.h  z-some-obnoxiously-longish-filename
From dc7cd08682a7618e1bb2ef9764960e39de14237f Mon Sep 17 00:00:00 2001
From: Carl Edquist <edqu...@cs.wisc.edu>
Date: Fri, 26 Mar 2021 04:27:54 -0500
Subject: [PATCH] ls: add --sort=width (-W) option to sort by filename width

This helps identify the outliers for long filenames, and also produces
a more compact display of columns when listing a directory with many
entries of various widths.

* src/ls.c (sort_type, sort_types, sort_width): New sort_width sort
type.
(sort_args): Add "width" sort arg.
(decode_switches): Parse '-W' option.
(cmp_width, fileinfo_width): New sort function and helper for filename
width.
(quote_name_width): Add function prototype declaration.
(usage): Document -W/--sort=width option.
* doc/coreutils.texi: Document -W/--sort=width option.
* tests/local.mk: Add new test.
* tests/ls/sort-width_W-option.sh: Exercise --sort=width and -W options.
* NEWS: Mention the new feature.
---
 NEWS                            |  2 ++
 doc/coreutils.texi              |  7 ++++++
 src/ls.c                        | 36 ++++++++++++++++++++++++---
 tests/local.mk                  |  1 +
 tests/ls/sort-width_W-option.sh | 43 +++++++++++++++++++++++++++++++++
 5 files changed, 85 insertions(+), 4 deletions(-)
 create mode 100755 tests/ls/sort-width_W-option.sh

diff --git a/NEWS b/NEWS
index 802f4b427..4ba164e85 100644
--- a/NEWS
+++ b/NEWS
@@ -70,6 +70,8 @@ GNU coreutils NEWS                                    -*- outline -*-
   ls --classify now supports the "always", "auto", or "never" flags,
   to support only outputting classifier characters if connected to a tty.
 
+  ls now accepts the --sort=width (-W) option, to sort by filename width.
+
   nl --line-increment can now take a negative number to decrement the count.
 
 ** Improvements
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 06ecdd74c..0c7bb8d44 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -7939,6 +7939,13 @@ Sort by version name and number, lowest first.  It behaves like a default
 sort, except that each sequence of decimal digits is treated numerically
 as an index/version number.  (@xref{Version sort ordering}.)
 
+@item -W
+@itemx --sort=width
+@opindex -W
+@opindex --sort
+@opindex width@r{, sorting option for @command{ls}}
+Sort by printed width of filenames.
+
 @item -X
 @itemx --sort=extension
 @opindex -X
diff --git a/src/ls.c b/src/ls.c
index 2d0450e54..12ea550e3 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -307,6 +307,10 @@ static void parse_ls_color (void);
 
 static void getenv_quoting_style (void);
 
+static size_t quote_name_width (const char *name,
+                                struct quoting_options const *options,
+                                int needs_general_quoting);
+
 /* Initial size of hash table.
    Most hierarchies are likely to be shallower than this.  */
 #define INITIAL_TABLE_SIZE 30
@@ -475,6 +479,7 @@ enum sort_type
     sort_none = -1,		/* -U */
     sort_name,			/* default */
     sort_extension,		/* -X */
+    sort_width,			/* -W */
     sort_size,			/* -S */
     sort_version,		/* -v */
     sort_time,			/* -t */
@@ -903,11 +908,11 @@ ARGMATCH_VERIFY (format_args, format_types);
 
 static char const *const sort_args[] =
 {
-  "none", "time", "size", "extension", "version", NULL
+  "none", "time", "size", "extension", "version", "width", NULL
 };
 static enum sort_type const sort_types[] =
 {
-  sort_none, sort_time, sort_size, sort_extension, sort_version
+  sort_none, sort_time, sort_size, sort_extension, sort_version, sort_width
 };
 ARGMATCH_VERIFY (sort_args, sort_types);
 
@@ -1958,7 +1963,7 @@ decode_switches (int argc, char **argv)
     {
       int oi = -1;
       int c = getopt_long (argc, argv,
-                           "abcdfghiklmnopqrstuvw:xABCDFGHI:LNQRST:UXZ1",
+                           "abcdfghiklmnopqrstuvw:xABCDFGHI:LNQRST:UWXZ1",
                            long_options, &oi);
       if (c == -1)
         break;
@@ -2155,6 +2160,11 @@ decode_switches (int argc, char **argv)
           sort_type_specified = true;
           break;
 
+        case 'W':
+          sort_type = sort_width;
+          sort_type_specified = true;
+          break;
+
         case 'X':
           sort_type = sort_extension;
           sort_type_specified = true;
@@ -3877,6 +3887,20 @@ cmp_extension (struct fileinfo const *a, struct fileinfo const *b,
   return diff ? diff : cmp (a->name, b->name);
 }
 
+static inline size_t
+fileinfo_width (struct fileinfo const *f)
+{
+  return quote_name_width (f->name, filename_quoting_options, f->quoted);
+}
+
+static inline int
+cmp_width (struct fileinfo const *a, struct fileinfo const *b,
+          int (*cmp) (char const *, char const *))
+{
+  int diff = fileinfo_width (a) - fileinfo_width (b);
+  return diff ? diff : cmp (a->name, b->name);
+}
+
 DEFINE_SORT_FUNCTIONS (ctime, cmp_ctime)
 DEFINE_SORT_FUNCTIONS (mtime, cmp_mtime)
 DEFINE_SORT_FUNCTIONS (atime, cmp_atime)
@@ -3884,6 +3908,7 @@ DEFINE_SORT_FUNCTIONS (btime, cmp_btime)
 DEFINE_SORT_FUNCTIONS (size, cmp_size)
 DEFINE_SORT_FUNCTIONS (name, cmp_name)
 DEFINE_SORT_FUNCTIONS (extension, cmp_extension)
+DEFINE_SORT_FUNCTIONS (width, cmp_width)
 
 /* Compare file versions.
    Unlike all other compare functions above, cmp_version depends only
@@ -3936,6 +3961,7 @@ static qsortFunc const sort_functions[][2][2][2] =
   {
     LIST_SORTFUNCTION_VARIANTS (name),
     LIST_SORTFUNCTION_VARIANTS (extension),
+    LIST_SORTFUNCTION_VARIANTS (width),
     LIST_SORTFUNCTION_VARIANTS (size),
 
     {
@@ -5454,7 +5480,8 @@ Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.\n\
   -S                         sort by file size, largest first\n\
       --sort=WORD            sort by WORD instead of name: none (-U), size (-S)\
 ,\n\
-                               time (-t), version (-v), extension (-X)\n\
+                               time (-t), version (-v), extension (-X),\n\
+                               width (-W)\n\
       --time=WORD            change the default of using modification times;\n\
                                access time (-u): atime, access, use;\n\
                                change time (-c): ctime, status;\n\
@@ -5478,6 +5505,7 @@ Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.\n\
 "), stdout);
       fputs (_("\
   -w, --width=COLS           set output width to COLS.  0 means no limit\n\
+  -W                         sort by entry name width\n\
   -x                         list entries by lines instead of by columns\n\
   -X                         sort alphabetically by entry extension\n\
   -Z, --context              print any security context of each file\n\
diff --git a/tests/local.mk b/tests/local.mk
index 27e31ec8e..41981ffd6 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -632,6 +632,7 @@ all_tests =					\
   tests/ls/symlink-quote.sh			\
   tests/ls/symlink-slash.sh			\
   tests/ls/time-style-diag.sh			\
+  tests/ls/sort-width_W-option.sh		\
   tests/ls/x-option.sh				\
   tests/ls/hyperlink.sh				\
   tests/mkdir/p-1.sh				\
diff --git a/tests/ls/sort-width_W-option.sh b/tests/ls/sort-width_W-option.sh
new file mode 100755
index 000000000..5eb7afd18
--- /dev/null
+++ b/tests/ls/sort-width_W-option.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+# Exercise the -W/--sort=width option.
+
+# Copyright (C) 2007-2021 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
+print_ver_ ls
+
+mkdir subdir       || framework_failure_
+touch subdir/aaaaa || framework_failure_
+touch subdir/bbb   || framework_failure_
+touch subdir/cccc  || framework_failure_
+touch subdir/d     || framework_failure_
+touch subdir/zz    || framework_failure_
+
+
+ls -W           subdir > out1 || fail=1
+ls --sort=width subdir > out2 || fail=1
+cat <<\EOF > exp || framework_failure_
+d
+zz
+bbb
+cccc
+aaaaa
+EOF
+
+compare exp out1 || fail=1
+compare exp out2 || fail=1
+
+Exit $fail
-- 
2.17.1

Reply via email to