Kaixo! On Thu, Feb 21, 2002 at 03:10:32AM -0500, Glenn Maynard wrote:
> One thing that's bound to be lost in the transition to UTF-8 filenames: > the ability to reference any file on the filesystem with a pure CLI. > If I see a file with a pi symbol in it, I simply can't type that; I have > to copy and paste it or wildcard it. If I have a filename with all > Kanji, I can only use wildcards. Well, it won't happen often that you will have to manipulate files with names including characters you cannot type. Usually you manage your files, and it is you that typed their filenames. kanji or pi letter can very well typed in a CLI environment; well, using a japanese XIM and a greek keyboard respectively. It isn't that much of a problem. > A normalization form would help a lot, though. It'd guarantee that in That however is indeed a problem. A problem similar to the case-insensitivity in Windows (where you could, at least with old versions, load a file named one way and save it another way; if you were using a case sensitive fs (eg a fs on aunix mounted by SMB on the windows machine) you ended up with different files and a real mess. The same thing could happen here; well, not as bad, as I don't think any program will purposedly *change* the chars composing a filename previously selected (eg when doing "open" then "save" there wouldn't be any name change); but whe a user will type manually a filename it could happen that the system will tell him "no such filename" and he will be puzzled as he sees there is; as there is no visual difference betwen a precomposed character like "aacute" and two characters "a" and "composing acute accent". This reminds me of a discussion in pango and the ability to have different view and edit modes: normal (with text showing as expected), and another mode where composing chars are de-composed, and invisible control characters (such as zwj, etc) are made visible. > I don't know who would actually normalize filenames, though--a shell > can't just normalize all args (not all args are filenames) and doing it > in all tools would be unreliable. The normalization should be done at the input method layer; that way it will be transparent and hopefully, if all OS do the same, the potential problem of duplicates will never happen. -- Ki �a vos v�ye b�n, Pablo Saratxaga http://www.srtxg.easynet.be/ PGP Key available, key ID: 0x8F0E4975 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
