On 04/28/2010 01:55 AM, MacArthur, Ian (SELEX GALILEO, UK) wrote:

> OK - yes, this is a mess. I think the assumption was always that we were
> (somehow) going to make the input text utf8 "clean" when we read it,
> then the majority of the functions and methods would never have to worry
> about this stuff.
> So far, that doesn't seem to have worked!

I don't believe trying to make UTF-8 data "clean" is *ever* going to 
work, and misguided attempts to do so are probably the main reason I18N 
is about 25 years behind schedule (UTF-8 was invented 25 years ago, 
believe it or not! And we STILL don't have Unicode filenames).

If you have an array of bytes and some combinations are "illegal", it 
does not help to pretend that some magical part of the computer hardware 
will make them not happen. It also does not help to throw errors and 
refuse to display anything and otherwise do denial-of-service when the 
bytes happen to be "wrong". The entire idea would be *insane* with any 
other data structure or communication format (imagine if sending files 
was aborted if there were spelling errors in the file), but for some 
reason the term "characters" causes otherwise intelligent programmers to 
turn into the most incredible morons (or idiot savants, really) and they 
will go to unbelievable contortions to somehow pretend that the hardware 
is dividing up the data at irregular boundaries.

Another place to look is at the users of UTF-16. They don't worry about 
errors and handle them just fine (UTF-16 in theory can have errors when 
you have unmatched surrogate halves). The same applies to UTF-8.
_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev

Reply via email to