On Wed, 2016-02-03 at 16:20 +0100, Adam Borowski wrote:
> On Wed, Feb 03, 2016 at 10:05:59AM -0500, Michael Stone wrote:
> > On Wed, Feb 03, 2016 at 03:34:21PM +0100, you wrote:
> > > There are cases when it does make sense: unprintable characters.
> > > 
> > > 15:25 < ansgar> Now, 'new'$'\t''line' could also be $'new\tline'
> > > or
> > >               new\tline.  But any of these is better than
> > > new?line
> > > 
> > > with which I do agree.  Thus, the handling of unprintables could
> > > use some
> > > improvement (although '$'\t'' is too long).
> > 
> > I'm actually not convinced this is true. Does it actually matter in
> > general
> > what the specific character is that accidentally got inserted into
> > a
> > filename? Like, when does it matter whether it's randomgarbage\t or
> > randomgarbage\n or randomgarbage\a or whatever? Do people often do
> > anything
> > other than rm randomgarbage<tab> ?

The "\n" was just because I was too lazy to construct a better example.

> You have a point.  I guess, the big question here is: what's the main
> purpose of ls's output?  Is it showing the directory's contents in
> human-friendly way?  Or is it something that needs to be a reversible
> transformation?
> 
> I for one prefer the former, Ansgar seems to want the latter.

A human-friendly reversible transformation ;)

> Also, should we consider junk in file names to be an error or a valid
> use case?
> 
> This reminds me: it's high time to write a kernel patch to ban
> creation of files which fail iswprint(), like some filesystems
> already do for broken
> Unicode.

I'm fairly sure POSIX requires that almost all garbage can be part of
filenames[1].  Also, userland still doesn't default to UTF-8 when no
LC_* variable is set.  This is what "ls" does then:

$ LC_ALL=C ls ~/Music
??????????????????????????????????????????????????????
?????????????????? ??????????????????????????????vs???????????????
???????????? Original Sound Track
...

(Which is pretty much the same as "ls" on non-UTF-8 filenames in an
UTF-8 locale I mentioned in an earlier mail.)

That's pretty useless. Any escaped form is more helpful.

Ansgar

  [1] At least a "filename" is defined as a byte string consisting of 
      anything except \0 and /:
      http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap0
3.html#tag_03_170

      There's also a "portable filename character set", but that is
      only [A-Za-z0-9_.-] and mentioned in the definition of
      "pathname": if only characters from the portable set are used
      in the filename, the name is usable in all locales as a character
      string, otherwise it's just a string.

      I'm not quite sure what *must* be supported to be usable though;
      I remember reading that "everything" as allowed, but couldn't
      find a reference right now.

Reply via email to