Re: ffs and utf8

Joel Rees Sat, 29 Nov 2014 16:39:34 -0800

On Sun, Nov 30, 2014 at 5:48 AM, Dmitrij D. Czarkoff <czark...@gmail.com> wrote:
> Ingo Schwarze said:
>> While the article is old, the essence of what Schneier said here
>> still stands, and it is not likely to fall in the future:
>>
>>   https://www.schneier.com/crypto-gram-0007.html#9
>
> Sorry, but this article is mostly based on lack of understanding of
> Unicode.


Sometimes I have found myself wondering whether Bruce Schneier's lack
of erudition is studied.

At any rate, I've found that, when he says "I see smoke," there is
often fire somewhere in the vicinity.

>> that would directly run contrary to some of OpenBSD's most important
>> project goals:  Correctness, simplicity, security.
>
> Yes, Unicode is very complex.  Just complex enough that there is (to my
> knowledge) no single application that does it right in every aspect.

Considering that making a universal character encoding scheme is, in
and of itself, a self-contradictory project, they've done moderately
well, I think.

> That said, the standard provides just enough facilities to make
> filesystem-related aspects of Unicode work nicely, particularily in case
> of utf-8.  Eg. ability to enforce NFD for all operations on file names
> could actually make several things more secure by preventing homograph
> attacks.

I think this assertion is a bit optimistic, and not just given your
following caveat.

> Unfortunately, there is no realistic hope that NFD will be enforced by
> every OS and filesystem out there any time soon, so at this stage file
> names with bytes outside printable ASCII range will cause problems at
> some point.  On my systems I limit filenames to [0-9A-Za-z~._/-] range.

Warning! Rambling ahead:

And now I find myself bemused again by my own regular tendency to be
confused by the conflation of the file name database with more general
purpose database indexes.

Fifteen years ago, I said to someone that the useful life of the
current encoding scheme in Unicode was about twenty-five years, and
that they/we should be looking for good ways to restructure it. I had
trouble then figuring out a way to disentangle the various
requirements, and I still don't see a clear way to it. But I'm
inclined to think the original idea of a 16-bit encoding was, while
not correctly seeing the reality of actually characters in use, was
almost seeing the requirements of the system correctly.

I think we need an "international" encoding that uses a restricted
subset of actual characters in use, and a structure that allows for a
simpler parsing of the international encoding part.

(And from here my thoughts get even less coherent. Sorry for the interruption.)

-- 
Joel Rees

Be careful when you look at conspiracy.
Look first in your own heart,
and ask yourself if you are not your own worst enemy.
Arm yourself with knowledge of yourself, as well.

Re: ffs and utf8

Reply via email to