Kaixo! On Wed, Feb 20, 2002 at 10:33:16AM -0800, H. Peter Anvin wrote:
> There is a fairly important point here: POSIX requires that a filename > can consist of any sequence of bytes other than '/' and '\0'. Valid I read this as "the system must be able to manipulate them", not as "use of any arbitrary and meaningless flow of bytes is encouraged". > UTF-8 sequences are a subset of this. This presumably means that > illegal UTF-8 sequences should be accepted by the server as valid > filenames; yes. > This *DOES*, however, have implications for applications that use > filenames, such as the shell or ls. Traditionally, nonreadable > filenames have been displayed using "escape codes". With UTF-8, there > are now two levels of "nonreadableness": > > a) Those that don't correspond to any valid UTF-8 sequences. > \xc2\x7f would be such a sequence. > > b) Those that don't correspond to a displayable Unicode. > \xf3\xa1\x88\xb4 a.k.a. \U000E1234 would be such a sequence. > > Since it is very important that the shell can access any file that can > exist in the system, I believe there should be a standard (formal or > informal) proposed for how to display these escape codes. Imho it just has to be done the same way as it currently is. Currently yo ucan have a filename with bytes in 0x01-0x1F and 0x7F-0x9F, however you cannot usually type those directly. Well, you can use those \x88 and the like representations, or use that lovely tab-completion feature (if the filename starts with a typable thing), or use a tool that allows you to pick the file in a menu (that is my preferred way to delete "bizarre" file names: select them in "mc" and press F8; it is much easier) > I think the > C standard has given us a fairly nice example to follow, and as shown > above I advocate using \U or \u escapes for valid UTF-8, and \x or > octal escapes for invalid UTF-8. -- Ki �a vos v�ye b�n, Pablo Saratxaga http://www.srtxg.easynet.be/ PGP Key available, key ID: 0x8F0E4975 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
