Me: >> After updating the background info in unicode.dox last week, >> I was hoping to start on at least listing the UTF8 functions >> on the page, [with FLTK2 and OksiD tags showing their origin >> for the moment], to at least gather the things into a single >> place for reference. But if you are already working on it...
Matt: > Yeah, we had the same idea. I like your approach more than mine > (gather the information first). My startegy is to fix Fl_Input > and Fl_Text_xxx and reorganize whenever I need something. This > is potentially incomplete and requires major cleaning up every > once in a while. A more systematic approach would be nice. I was trying to understand Fl_Text_Display.cxx and failing miserably without easy to access documentation about the UTF-8 functions... Me: >> Also, is the plan to "merge" the FLTK2 stuff into the OksiD >> code, or "merge" the OksiD code into the FLTK2 stuff, or produce >> a hybrid third way. In other words, will the FLTK1 UTF8 code >> present a similar API to the FLTK2 UTF8 code, or will it be a >> completely new beast? Matt: > OK, I looked at the FLTK2 interface. I believe we should stick > with and extend that API. I case we ever merge, we all have one > headache less. One thing though that I would like to recommend: > let's keep the Unicode API truly plain "C". It is included by > other plain "C" interfaces within FLTK. I've just been through unicode.dox and grouped the "similar" or "related" UTF-8 functions and provided brief summary information. It might not be 100% correct, or in the optimum order, but at least there is now an overview that people can refer to/comment on. See http://www.fltk.org/newsgroups.php?s3940+gfltk.commit+v3959+T0 Several things occurred to me while doing this: 1. The OksiD and FLTK2 naming conventions are not consistent and once you start talking about ucs, wc, mb, mbcs it's easy to get lost. Similarly when talking about len that could be bytes, wchar_t, characters, etc. Getting your head around the differences between ASCII, ISO-8859-1, Unicode and UTF-8 is the biggest hurdle at the beginning. It would be useful to agree on some terminology and variable names. 2. The OksiD code has an explicit note that it is limited to 24bit Unicode with probably only 16bits needed in practice for linux and win32, but the fl_utf_strcasecmp() comments about 3*len seem to imply the 16bit limit (0xFFFF is a 3-byte sequence in UTF-8). Or maybe toupper() and tolower() just don't mean anything for 0x10000 and above so the size can never increase beyond 3*len. The FLTK2 code appears to handle a wider range of characters, e.g. fl_utf8toUft16() mentions 0x10000 to 0x10ffff, but the lookup tables don't seem as comprehensive as those from OksiD. 3. Should we consider splitting the UTF-8 stuff out into its own library as there don't appear to be many fast light UTF-8 ones out there? Or bear it in mind as something for the future. Cheers D _______________________________________________ fltk-dev mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk-dev
