> I was about to answer "The problem is..." but then I though of some > more and felt a Monty Python Spanish Inquisition moment coming on :-)
So long as I get the comfy chair, then... > The first problem here is that the various macros to handle extended > mappings, ie. ERRORS_TO_ISO8859_1, ERRORS_TO_CP1252 and STRICT_RFC3629 > only apply to the fl_utf8decode() function in fl_utf.c [from FLTK2 ?] > The other functions there, e.g. fl_utf8fwd() and fl_utf8back() assume > they have true utf-8 sequences only, and I don't think they > will handle > isolated CP1252 0x80-0x9f characters properly. But I need to > check this. OK - yes, this is a mess. I think the assumption was always that we were (somehow) going to make the input text utf8 "clean" when we read it, then the majority of the functions and methods would never have to worry about this stuff. So far, that doesn't seem to have worked! > The second problem is that fl_utf.c isn't the only source of > functions. > There's also O'ksi'D's fl_utf8.cxx implementation, which includes the > fl_utf8len(char c) function. As Bill pointed out, fl_utf8len() does > not have the full context to determine whether a byte is a CP1252 > 0x80-0x9f byte or a utf-8 trailing byte. According to fl_utf8len() > the length of a utf-8 trailing byte is -1. That's the issue here. I confess I thought that fl_utf8len() was not actually used anywhere. But grep says it is... In the Fl_Text_* stuff, and in Fl_Input_ I had tried to make the 1.1.8-utf8 stuff use the fltk2 functions as much as possible, and I thought I'd more or less "deprecated" the oksid ones, but it rather looks like I only did half a job there. > > The third problem: the Fl_Text_* code not only uses fl_utf8len() but > also does a lot of its own bit testing against 0x80 and 0x40 masks, > which muddies the waters rather a lot. Yes... > > And finally, the Fl_Text_* code is also doing some "smart" expansion > of specific characters, tab to spaces, 0x01-0x1f and DEL 0x7f to > readable mnemonic forms, and then trying to handle top-bit "utf-8" > characters using fl_utf8len(). There's no testing for the CP1252 > 0x80-0x9f characters first, and the ERRORS_TO_CP1252 macro is not > defined in these files anyway. Urgh... > And finally plus one :-) I haven't checked any of the text handling > in any of the other widgets at all. At some point in the past I had convinced myself that they were OK, but now I am not sure. The fact that fl_utf8len() appears in Fl_Input_ makes me worry... SELEX Galileo Ltd Registered Office: Sigma House, Christopher Martin Road, Basildon, Essex SS14 3EL A company registered in England & Wales. Company no. 02426132 ******************************************************************** This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ******************************************************************** _______________________________________________ fltk-dev mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk-dev
