>The terminal >should renormalize everything (including pastes) to NFC.
Then how will I paste in some wacky invalid filename into my terminal in order, to say, rm it? Like I was saying, paste's should not be normalized. >Of course, it's reasonable for this to be an option, but NFC seems to >be a sensible default, at least when connecting to Unix systems. Normalization for D has some serious drawbacks: if you were to try to implement, say vietnamese using only composing characters, it would look horrible. The appearance, position, shape, and size of the combining accents depends on which letter they are being combined with, as well as which other diacritics are being combined with that same letter. NF-C is most appropriate for some scripts, and NF-D may be desirable for others. It would be better, IMO, if unicode would get rid of both forms, and simply support one representation of each possible glyph. (No combining characters unless they are the ONLY way to represent a particular glyph) (Actually, no combining chars at all would be best, because its simplest. Why not just assign more code space to the langs that need it?) If you have a filesystem that forces NF-D, then I would say its a poorly designed filesystem that makes such choices, because its way to low level to care about things like that. Filenames should be "string of bytes", and the UI-conventions should allow one to distunguish. If you are on a NF-C==canonical system, and you mount such a filesystem, you should see bakemoji, and not any translated normalization form. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
