The previous fltk2 code would convert any bad UTF-8 sequences as though each byte was in CP1252. This was changed so that it changes each byte into an error character.
I definatly do not like it aborting or hiding any invalid bytes. They must draw something and the rest of the string must be drawn. The CP1252 certainly makes it far easier to be back-compatible, but as Michael states it makes it easy to keep using non-UTF-8. Printing error indicators might be best. imacarthur wrote: > On 7 Oct 2008, at 22:23, Michael Sweet wrote: > >> Bill Spitzak wrote: >>> ... >>> I believe this is what is happening. In fltk2 id does not use the Xft >>> UTF8 functions, it works similar to the other platforms. This allows it >>> to display most ISO-8859-1 correctly. More importantly, it will draw >>> something for a piece of text with invalid UTF_8 in it. The Xft >>> functions quit if they see invalid UTF-8 which is not very desirable >>> behavior. >> >> Depends on what you want. I'd rather see consistent behavior (don't >> display invalid UTF-8) than try to support both ISO-8859-1 (or your >> favorite 8-bit encoding) and UTF-8 inconsistently. >> >> Moreover, if FLTK allows a mix of UTF-8 and 8-bit characters, it will >> be very likely that text fields and other widgets will end up with a >> mix, leading to really interesting problems when those values are >> written to a file... >> >> So, I'm -1 on supporting both ISO-8859-1 and UTF-8 at the same time >> through any kind of auto-detect code. > > In principle I agree with Michael on this one, in that I'd rather be > "pure" utf-8 and choke on invalid sequences... > > But I worry that, at least in the short term, there's a awful lot of > text out there (not ISO-8859-1 admittedly but CP1252, Mac-Roman, etc. > possibly even DOS CP 850?) with characters in the 0x80-0x9F range that > utf-8 will ignore as control chars but which *might* map to usable chars > in some or other encoding. Although quite how we know *what* other > encoding... > This might matter to me - the Euro sign "€" appears at 0x80 in CP1252 > for example. IIRC, HTML-5 calls up the CP1252 "interpretation" for text > that claims to be ISO-8859-1, so the 0x80 to 0x9F range might be "valid" > in that context. > > However, that said, it's not clear that any of this applies to > Albrecht's specific example as his errant chars were not in this range - > clearly there's something I've messed up elsewhere, and I don't know what. _______________________________________________ fltk-dev mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk-dev
