Re: [fltk.development] r6765 ... / some utf8 reorganisation

Duncan Gibson Sat, 18 Apr 2009 05:40:13 -0700

Me:
>> After updating the background info in unicode.dox last week,
>> I was hoping to start on at least listing the UTF8 functions
>> on the page, [with FLTK2 and OksiD tags showing their origin
>> for the moment], to at least gather the things into a single
>> place for reference. But if you are already working on it...


Matt:
> Yeah, we had the same idea. I like your approach more than mine
> (gather the information first). My startegy is to fix Fl_Input
> and Fl_Text_xxx and reorganize whenever I need something. This
> is potentially incomplete and requires major cleaning up every
> once in a while. A more systematic approach would be nice.

I was trying to understand Fl_Text_Display.cxx and failing miserably
without easy to access documentation about the UTF-8 functions...

Me:
>> Also, is the plan to "merge" the FLTK2 stuff into the OksiD
>> code, or "merge" the OksiD code into the FLTK2 stuff, or produce
>> a hybrid third way. In other words, will the FLTK1 UTF8 code
>> present a similar API to the FLTK2 UTF8 code, or will it be a
>> completely new beast?

Matt:
> OK, I looked at the FLTK2 interface. I believe we should stick
> with and extend that API. I case we ever merge, we all have one
> headache less. One thing though that I would like to recommend:
> let's keep the Unicode API truly plain "C". It is included by
> other plain "C" interfaces within FLTK.

I've just been through unicode.dox and grouped the "similar" or
"related" UTF-8 functions and provided brief summary information.
It might not be 100% correct, or in the optimum order, but at
least there is now an overview that people can refer to/comment on.
See http://www.fltk.org/newsgroups.php?s3940+gfltk.commit+v3959+T0

Several things occurred to me while doing this:

1. The OksiD and FLTK2 naming conventions are not consistent
   and once you start talking about ucs, wc, mb, mbcs it's easy
   to get lost. Similarly when talking about len that could be
   bytes, wchar_t, characters, etc. Getting your head around the
   differences between ASCII, ISO-8859-1, Unicode and UTF-8 is
   the biggest hurdle at the beginning. It would be useful to
   agree on some terminology and variable names.

2. The OksiD code has an explicit note that it is limited to 24bit
   Unicode with probably only 16bits needed in practice for linux
   and win32, but the fl_utf_strcasecmp() comments about 3*len seem
   to imply the 16bit limit (0xFFFF is a 3-byte sequence in UTF-8).
   Or maybe toupper() and tolower() just don't mean anything for
   0x10000 and above so the size can never increase beyond 3*len.
   The FLTK2 code appears to handle a wider range of characters,
   e.g. fl_utf8toUft16() mentions 0x10000 to 0x10ffff, but the
   lookup tables don't seem as comprehensive as those from OksiD.

3. Should we consider splitting the UTF-8 stuff out into its own
   library as there don't appear to be many fast light UTF-8 ones
   out there? Or bear it in mind as something for the future.

Cheers
D
_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev

Re: [fltk.development] r6765 ... / some utf8 reorganisation

Reply via email to