Tim Bray wrote:
>> That problem is that Unicode is stateful with complex and
>> indefinitely long term states
> Has this ever caused a real problem to a real programmer in real life?
Yes, of course. State information preserved between lines is
really annoying.
But, you miss the point in my original mail:
: Unicode is not even finite state, which means some pattern
: matching and normalization problems are hard or insolvable.
that is, with Unicode, you can not search strings in reasonable
amount of time.
> I have written a whole bunch of mission-critical code that reads and
> generates UTF-8, and any correct implementation will have to deal with
> the fact that there is no necessary connection between the number of
> glyphs on the screen and bytes in its encoding.
You completely miss the point. It has nothing to do with the long
term state.
> It would be perfectly
> reasonable for an implementation to declare a limitation, for example
> that it will not process than 32 trailing modifiers on any character,
> and this would not cause problems in production because sequences of
> such a length do not occur in the encoding of any known text.
I said "long term state", which, of course, is not confined in a
character with or without modifiers.
> Which is to say, Ohta's statement about statefulness is true, but the
> conclusion that this is a "problem" is erroneous. -Tim
Instead, your statement: "I have written a whole bunch of mission-
critical code that reads and generates UTF-8" is untrustworthy.
Of course, it is perfectly reasonable for an implementation to
declare a limitation, for example, that it will not process
non-ASCII characters, which may also be the assumption of your
code.
Masataka Ohta
_______________________________________________
Ietf mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/ietf