On Fri, Feb 22, 2002 at 12:55:31AM +0100, Pablo Saratxaga wrote: > > OTOH, the unprinting character problem is important. Would it be > > reasonable to escape (\u) characters with wcwidth(c)==0 (in tool output, > > ie ls -b), or is there some reasonable use of them in filenames? > > There are reasonable use of zwj and zwnj and similar, they are needed > for proper writing in some languages. > > In fact, all the trouble comes from the xterm, not from "ls".
If a filename is a BOM followed by "hello", how can I enter it? I don't expect my terminal emulator to remember all control characters sent at any cursor position and paste them along with other characters, so I'd end up pasting "hello" alone. It's worse when the filename is *only* unprinting characters, and there's nothing on screen to copy at all. (That's just plain confusing, too.) We can't blame the terminal for not being able to copy and paste arbitrary sequences of bytes. It's not ls's "fault" either, per se (it's inherent), but that doesn't mean it can't help. > I would say that ls should not escape them, only invalid utf-8 and > control chars. > > then, another command line switch should be added to "escape all but > printable ascii". Well, I'd like all nonprinting characters escaped, but not, say, $BF|K\8l(B. That means I can copy and paste the filename, and characters that *can* be copied and pasted aren't escaped. (but see below) > more complex options are not to be done in the command line on an xterm, > a graphical toolkit is more suited for that. It's acceptable to go from "able to type all filenames with the keyboard" to "need to copy and paste filenames which I can't type directly". That's reasonable (if only because it's unavoidable). (As has been pointed out, it's already there in ISO-8859-1.) It's not acceptable to have filenames that I can't access from a CLI (with C+P) reliably at all (or that I need to switch to a special ls mode that escapes *everything* over ASCII to access.) Wildcards are a useful fallback, but they don't stand alone--it still wouldn't help me target a file consisting only of control characters, for example. Telling me to "use a GUI" is simply no good. (I'm not installing X on a 486 running FTP to delete a file someone dumped in my /incoming.) Files are an extremely fundamental part of a Unix system, and all fundamental parts of Unix are accessible from a CLI. That's always been one of its greatest strengths, and we can't throw that away for filenames. This is why GNU ls supports escaping. > the reason is that with ls/xterm the rendering and the tool handling the > filenames are dissociated, so you cannot easily do interesting things, ls supports escaping that matches bash's. (\ooo, \xHH, \n, etc.) If this is extended to include \uXXXX and \UXXXXXXXX, then ls can be extended to allow (optionally, for the sake of compatibility) displaying escape characters, etc. in that form. (I think that extension is useful, whether or not ls uses it.) Just because the tools aren't maintained by the same person doesn't mean there can't be cooperation. (Though, considering how difficult it's proving to be to get UTF-8 support at all in bash, I don't expect *all* shells to support this.) This doesn't involve xterm (or any terminal) at all, just the shell and tools. > So, the only interesting change that would be worth doing for the > use of utf-8 in filenames will be an extra switch to ls to quote > everything but ascii, and ensure it quotes incorrect utf-8 when the > locale is in utf-8 mode. I disagree; I think it's interesting, useful and practical to escape certain other cases. Leading combining characters, probably, and any characters not useful in filenames. (Of course, it's not necessarily easy to determine what's useful. I don't see BIDI support in filenames as useful--that seems to be a property of whatever text is displaying the filenames, not the filename themselves--but I'm not a BIDI user, so I can only guess.) I'm unclear on how control characters that change state behave in filenames at all. To pick a simple example, what if a filename contains the language code "zh"? I can no longer do a simple C program that outputs "The first file is %s. The second file is %s. [...]" as the text after the first %s is marked Chinese. (This probably won't break anything, but other control characters probably would.) Invalidate all state after outputting a filename? Complicated. (I don't know what zwj and zwnj do; perhaps a more practical example could be made with them.) Anyone feel like filling me in here? This would be like enbedding ANSI color sequences in filenames and ls letting it through: the color would bleed onto the next line unless ls knew to reset the color after each filename. -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
