Kaixo!

On Wed, Feb 20, 2002 at 10:33:16AM -0800, H. Peter Anvin wrote:

> There is a fairly important point here: POSIX requires that a filename
> can consist of any sequence of bytes other than '/' and '\0'.  Valid

I read this as "the system must be able to manipulate them", not as
"use of any arbitrary and meaningless flow of bytes is encouraged".

> UTF-8 sequences are a subset of this.  This presumably means that
> illegal UTF-8 sequences should be accepted by the server as valid
> filenames;

yes.

> This *DOES*, however, have implications for applications that use
> filenames, such as the shell or ls.  Traditionally, nonreadable
> filenames have been displayed using "escape codes".  With UTF-8, there
> are now two levels of "nonreadableness":
> 
> a) Those that don't correspond to any valid UTF-8 sequences.
>    \xc2\x7f would be such a sequence.
> 
> b) Those that don't correspond to a displayable Unicode.
>    \xf3\xa1\x88\xb4 a.k.a. \U000E1234 would be such a sequence.
> 
> Since it is very important that the shell can access any file that can
> exist in the system, I believe there should be a standard (formal or
> informal) proposed for how to display these escape codes.

Imho it just has to be done the same way as it currently is.

Currently yo ucan have a filename with bytes in 0x01-0x1F and 0x7F-0x9F,
however you cannot usually type those directly.
Well, you can use those \x88 and the like representations, or use
that lovely tab-completion feature (if the filename starts with
a typable thing), or use a tool that allows you to pick the
file in a menu (that is my preferred way to delete "bizarre" file names:
select them in "mc" and press F8; it is much easier)

> I think the
> C standard has given us a fairly nice example to follow, and as shown
> above I advocate using \U or \u escapes for valid UTF-8, and \x or
> octal escapes for invalid UTF-8.

-- 
Ki �a vos v�ye b�n,
Pablo Saratxaga

http://www.srtxg.easynet.be/            PGP Key available, key ID: 0x8F0E4975

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to