On Wed, 2016-02-03 at 16:20 +0100, Adam Borowski wrote: > On Wed, Feb 03, 2016 at 10:05:59AM -0500, Michael Stone wrote: > > On Wed, Feb 03, 2016 at 03:34:21PM +0100, you wrote: > > > There are cases when it does make sense: unprintable characters. > > > > > > 15:25 < ansgar> Now, 'new'$'\t''line' could also be $'new\tline' > > > or > > > new\tline. But any of these is better than > > > new?line > > > > > > with which I do agree. Thus, the handling of unprintables could > > > use some > > > improvement (although '$'\t'' is too long). > > > > I'm actually not convinced this is true. Does it actually matter in > > general > > what the specific character is that accidentally got inserted into > > a > > filename? Like, when does it matter whether it's randomgarbage\t or > > randomgarbage\n or randomgarbage\a or whatever? Do people often do > > anything > > other than rm randomgarbage<tab> ?
The "\n" was just because I was too lazy to construct a better example. > You have a point. I guess, the big question here is: what's the main > purpose of ls's output? Is it showing the directory's contents in > human-friendly way? Or is it something that needs to be a reversible > transformation? > > I for one prefer the former, Ansgar seems to want the latter. A human-friendly reversible transformation ;) > Also, should we consider junk in file names to be an error or a valid > use case? > > This reminds me: it's high time to write a kernel patch to ban > creation of files which fail iswprint(), like some filesystems > already do for broken > Unicode. I'm fairly sure POSIX requires that almost all garbage can be part of filenames[1]. Also, userland still doesn't default to UTF-8 when no LC_* variable is set. This is what "ls" does then: $ LC_ALL=C ls ~/Music ?????????????????????????????????????????????????????? ?????????????????? ??????????????????????????????vs??????????????? ???????????? Original Sound Track ... (Which is pretty much the same as "ls" on non-UTF-8 filenames in an UTF-8 locale I mentioned in an earlier mail.) That's pretty useless. Any escaped form is more helpful. Ansgar [1] At least a "filename" is defined as a byte string consisting of anything except \0 and /: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap0 3.html#tag_03_170 There's also a "portable filename character set", but that is only [A-Za-z0-9_.-] and mentioned in the definition of "pathname": if only characters from the portable set are used in the filename, the name is usable in all locales as a character string, otherwise it's just a string. I'm not quite sure what *must* be supported to be usable though; I remember reading that "everything" as allowed, but couldn't find a reference right now.

