On Sat, Feb 23, 2002 at 10:18:28AM +0900, Gaspar Sinai wrote: > This was just a suggestion to clean up things by > specifying the characters that can be allowed for > filenames. Currently we can not have "/", ".", ".." > and "\0" for a filename. What if we say we can not
More precisely, you can't have "." or ".." for a filename and you can not have "/" and nul *in* filenames, and you can look at the first two as "these files already exist" and not really a restriction as such. > have composing and zero with characters for a filename? Er, composing characters are OK, NFC just avoids them when there's a precomposed alternative available. (And Pablo said that there are some zero-width characters that are useful in filenames ... which is rather annoying.) Why can't we do that? Because filenames would go from being nearly 8-bit clean to having UTF-8 specific requirements. That's not the FS's job. And this couldn't only by NFS: the problems you're describing would happen with local FS's, too--and they need to work with all active charsets, not just UTF-8. > That would not need compicated normalization - just > a character check. The current restrictions on filenames have been around forever, are unavoidable, and are the only things keeping filenames from being completely 8-bit clean. (Normalization involves changing text, as well; the existing restrictions are simply pass or fail.) Aside: can a UTF-8 string ever grow longer due to being changed to NFC? It's obvious that a wide char string can't, but it's not clear that this holds with UTF-8 (and if so, that it always will.) > The problem occurs if normalization does happen - and some programs > may do normalization. If any are normalizing to NFD, they should probably be changed to not do that. Fixing that isn't the FS's job. But the filesystem, C library calls, network protocols, etc. should *never* change filenames at all. That stuff must remain 8-bit clean (as far as it is now.) I'm not advocating any low-level constraints or normalization at all. I just want to be able to use UTF-8 in filenames, without hitting filenames that I can't use c+p to enter. That's not the FS's job to fix, it's the interface's. The simple solution, have tools escape zero-width chars and other oddities, isn't quite good enough, due to some of these characters being useful in filenames. (I might settle for it myself--I don't use any languages that need them--but it'd be nice to find a more general solution.) This isn't a new problem, it's new symptoms of an old one. The old ones were fixed by escaping invalid byte sequences, spaces, and ASCII control characters--the new symptoms just need to be worked out. (Invalid UTF-8 sequences aren't one of these new problems--ls already escapes those.) -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
