DO NOT REPLY TO THIS MESSAGE.  INSTEAD, POST ANY RESPONSES TO THE LINK BELOW.

[STR New]

Link: http://www.fltk.org/str.php?L2348
Version: 1.3-current


I can see four cases:

1: correct UTF-8 source text
2: ASCII with some encoding for characters 128 and above
3: some multibyte encoding
4: defective UTF-8 encoded text

Loading any of these must be robust, FLTK must not crash. The UTF-8
functions are not very robust at all, so we must make sure that all text
is always legal UTF-8. We can fix that relatively easily for 8 bit ascii
(as seen in the recent patch).

What should happen if the users tries to load text for any of case 2-4? 
a: don't load the text at all and output a message 
b: load the text, but warn that it contains unknown characters
c: load the text (some or many characters may look wrong), and only warn
if the user modifies or reads the text (sav, dnd, etc.)
d: convert illegal UTF-8 sequences into the UTF-8 "illegal character"
code, followed by an educated guess. This text may look odd, but it could
be decoded and saved again without changes to the original text. It could
even be edited, but non-ascii text would generate wrong codes. This is
similar to writing \t for tab, \000 for nul, etc., only this is UTF-8.
e: add a Fl_Text_Converter class hierarchy that offer conversion as in d,
but can also be overridden to offer any other character encoding.

Fl_Text_Converter -> Fl_Text_Converter_16bit -> Fl_Text_Converter_UTF16
                  -> Fl_Text_Converter_8bit  -> Fl_Text_Converter_CP1512
                                             -> Fl_Text_Converter_MacRoman
...

I would suggest d for 1.3.0 and e for the next version.

BTW, whichever we do, we probably ought to apply it to Fl_Input as well.


Link: http://www.fltk.org/str.php?L2348
Version: 1.3-current

_______________________________________________
fltk-bugs mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-bugs

Reply via email to