Kaixo!

On Thu, Feb 21, 2002 at 03:10:32AM -0500, Glenn Maynard wrote:

> One thing that's bound to be lost in the transition to UTF-8 filenames:
> the ability to reference any file on the filesystem with a pure CLI.
> If I see a file with a pi symbol in it, I simply can't type that; I have
> to copy and paste it or wildcard it.  If I have a filename with all
> Kanji, I can only use wildcards.

Well, it won't happen often that you will have to manipulate files with
names including characters you cannot type.
Usually you manage your files, and it is you that typed their filenames.

kanji or pi letter can very well typed in a CLI environment; well,
using a japanese XIM and a greek keyboard respectively.

It isn't that much of a problem.
 
> A normalization form would help a lot, though. It'd guarantee that in

That however is indeed a problem.

A problem similar to the case-insensitivity in Windows (where you could,
at least with old versions, load a file named one way and save it
another way; if you were using a case sensitive fs (eg a fs on aunix mounted
by SMB on the windows machine) you ended up with different files and a
real mess.

The same thing could happen here; well, not as bad, as I don't think any
program will purposedly *change* the chars composing a filename previously
selected (eg when doing "open" then "save" there wouldn't be any name
change); but whe a user will type manually a filename it could happen
that the system will tell him "no such filename" and he will be puzzled
as he sees there is; as there is no visual difference betwen a precomposed
character like "aacute" and two characters "a" and "composing acute accent".

This reminds me of a discussion in pango and the ability to have different
view and edit modes: normal (with text showing as expected), and another
mode where composing chars are de-composed, and invisible control characters
(such as zwj, etc) are made visible.

> I don't know who would actually normalize filenames, though--a shell
> can't just normalize all args (not all args are filenames) and doing it
> in all tools would be unreliable.

The normalization should be done at the input method layer; that way it will
be transparent and hopefully, if all OS do the same, the potential problem
of duplicates will never happen.


-- 
Ki �a vos v�ye b�n,
Pablo Saratxaga

http://www.srtxg.easynet.be/            PGP Key available, key ID: 0x8F0E4975

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to