>Joel Rees writes: >> 2014/12/03 22:23 "Dmitrij D. Czarkoff" <czark...@gmail.com>: >> > >> > First of all, I really don't believe that preservation of non-canonical >> > form should be a consideration for any software. >> >> There is no particular canonical form for some kinds of software. >> >> Unix, in particular, happens to have file name limitations that are >> compatible with all versions of Unicode past 2.0, at least, in UTF-8, but >> it has no native encoding. > >To me, the current state of affairs--where filenames can contain >anything and the same filename can and does get interpreted differently >by different programs--feels extremely dangerous. Moving to a single, >well-defined encoding for filenames would make things simpler and >safer. Well, it might. That's why we're discussing this carefully, to >figure out if something like this is actually workable. > >There are two kinds of features being discussed: > >1) Unicode normalization. This is analogous to case insensitivity: > multiple filenames map to the same (normalized) filename. > >2) Disallowing particular characters. 1-31 and invalid UTF-8 sequences > are popular examples. > >Maybe one is workable. Maybe both are, or neither. > >Say I have a hypothetical machine with the above two features >(normalizing to NFC, disallowing 1-31/invalid UTF-8). Now I log into a >typical Unix "anything but \0 or /" machine, via SFTP or whatever. What >are the failure modes? > >The first kind is that I could type "get x" followed by "get y", >where x and y are canonically the same in Unicode but represented >differently because they're not normalized on the remote host. I would >expect this to work smoothly: first I download x to NFC(x), and then >b overwrites it. > >The second kind is that I could type "get z", where z contains an invalid >character. How should my system handle this? Error as if I had asked for >a filename that's too long? Come up with a new errno? I don't know, but >in this hypothetical machine it should fail somehow. > >But creating new files is only part of the problem. If we still allow >them in existing files, we lose all the security/robustness benefits >and just annoy ourselves by adding restrictions with no point. > >So say I mount a filesystem containing the same files a, b, and c. What >happens? > > - Fail to mount? (Simultaneously simplest, safest, and least useful) > - Hide the files? (Seems potentially unsafe) > - Try to escape the filenames? (Seems crazy) > >Is it currently possible to take a hex editor and add "/" to a filename >(as opposed to a pathname) inside a disk image? If that's possible, how >do systems currently deal with it? Because it's the same problem. > >FAT32 has both case insensitivity and disallowed characters. How well >does OpenBSD handle those restrictions? If not optimally, then how can >they be made better? If it already handles them with aplomb, then is >it applicable to the above scenarios?
http://en.wikipedia.org/wiki/Where%27s_the_beef%3F I mean, where's the diffs for all these issues? Oh. There is no beef. This is idle chatter hoping someone supplies some secret sauce that makes a disparate audience with different demands all happy. Why don't you guys go write some code and prove your points? Maybe this is simply a very hard problem, and not going to be satisfied by people who simply talk about it?