John Cowan <jcowan at reutershealth dot com> wrote: > Windows filesystems do know what encoding they use. But a filename on > a Unix(oid) file system is a mere sequence of octets, of which only 00 > and 2F are interpreted. (Filenames containing 20, and especially 0A, > are annoying to handle with standard tools, but not illegal.) > > How these octet sequences are translated to characters, if at all, > is no concern of the file system's. Some higher-level tools, such as > directory listers and shells, have hardwired assumptions, others have > changeable assumptions, but all are assumptions.
OK, fair enough. Under a Unixoid file system, a file name consists of a more or less arbitrary sequence of bytes, essentially unregulated by the OS. If interpreted as UTF-8, some of these sequences may be invalid, and the files may be inaccessible. This is *exactly* the same scenario as with GB 2312, or Shift-JIS, or KS C 5601, or ISO 6937, or any other multibyte character encoding ever devised. This is not a problem that needs to be solved within Unicode, any more than it needed to be solved within those other encodings. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

