First of all, I really don't believe that preservation of non-canonical
form should be a consideration for any software.  There is no single
reason to allow non-canonical forms to exist at all, while there are
several reasons to avoid them.  More so for foreign encodings in
filenames - if you are trying to store UTF-16 names on a system with
UTF-8 locale, you should be converting, not escaping.  Doing otherwise
is just asking for troubles.

Next, I assume that ability to enter filenames trumps ability to
preserve original filename on Unix-like systems.  In most cases right
now these two values don't clash, because user input is normalized from
the very beginning in IME.  That said, there may be exceptions.  Eg.
several mail clients won't normalize filename if input encoding matches
encoding of attachement.  Thus, having recieved a file with non-ASCII
filename from Mac, you'll end up being unable to address it from shell
even if it was typed using exactly the same keyboard layout you use.  I
don't see how this situation may be justified.  The rare cases when
original filenames must be preserved byte to byte warrant some special
handling (eg. storing filenames elsewhere separately or preserving the
whole files with names and attributes in some archive or other form of
special database).

Finally, provided that both ends of network communication use canonical
forms for Unicode, the matter of storing file remotely and then
recieving it back with filename intact is simply a matter of
normalization on reciever's side.  That is: if you prefer your local
files in NFD, and your NAS uses NFC, you should simply normalize
filenames when you recieve files back.  The only potential problem here
is "compatibility" normalizations, but these are already problematic
enough to be avoided in all cases where NFD or NFC do the job.

-- 
Dmitrij D. Czarkoff

Reply via email to