>Joel Rees writes:
>> 2014/12/03 22:23 "Dmitrij D. Czarkoff" <czark...@gmail.com>:
>> >
>> > First of all, I really don't believe that preservation of non-canonical
>> > form should be a consideration for any software.
>> 
>> There is no particular canonical form for some kinds of software.
>> 
>> Unix, in particular, happens to have file name limitations that are
>> compatible with all versions of Unicode past 2.0, at least, in UTF-8, but
>> it has no native encoding.
>
>To me, the current state of affairs--where filenames can contain
>anything and the same filename can and does get interpreted differently
>by different programs--feels extremely dangerous. Moving to a single,
>well-defined encoding for filenames would make things simpler and
>safer. Well, it might. That's why we're discussing this carefully, to
>figure out if something like this is actually workable.
>
>There are two kinds of features being discussed:
>
>1) Unicode normalization. This is analogous to case insensitivity:
>   multiple filenames map to the same (normalized) filename.
>
>2) Disallowing particular characters. 1-31 and invalid UTF-8 sequences
>   are popular examples.
>
>Maybe one is workable. Maybe both are, or neither.
>
>Say I have a hypothetical machine with the above two features
>(normalizing to NFC, disallowing 1-31/invalid UTF-8). Now I log into a
>typical Unix "anything but \0 or /" machine, via SFTP or whatever. What
>are the failure modes?
>
>The first kind is that I could type "get x" followed by "get y",
>where x and y are canonically the same in Unicode but represented
>differently because they're not normalized on the remote host. I would
>expect this to work smoothly: first I download x to NFC(x), and then
>b overwrites it.
>
>The second kind is that I could type "get z", where z contains an invalid
>character. How should my system handle this? Error as if I had asked for
>a filename that's too long? Come up with a new errno? I don't know, but
>in this hypothetical machine it should fail somehow.
>
>But creating new files is only part of the problem. If we still allow
>them in existing files, we lose all the security/robustness benefits
>and just annoy ourselves by adding restrictions with no point.
>
>So say I mount a filesystem containing the same files a, b, and c. What
>happens?
>
> - Fail to mount? (Simultaneously simplest, safest, and least useful)
> - Hide the files? (Seems potentially unsafe)
> - Try to escape the filenames? (Seems crazy)
>
>Is it currently possible to take a hex editor and add "/" to a filename
>(as opposed to a pathname) inside a disk image? If that's possible, how
>do systems currently deal with it? Because it's the same problem.
>
>FAT32 has both case insensitivity and disallowed characters. How well
>does OpenBSD handle those restrictions? If not optimally, then how can
>they be made better? If it already handles them with aplomb, then is
>it applicable to the above scenarios?

http://en.wikipedia.org/wiki/Where%27s_the_beef%3F

I mean, where's the diffs for all these issues?

Oh.  There is no beef.

This is idle chatter hoping someone supplies some secret sauce that
makes a disparate audience with different demands all happy.


Why don't you guys go write some code and prove your points?
Maybe this is simply a very hard problem, and not going to be satisfied
by people who simply talk about it?

Reply via email to