Re: [fltk.development] More fun with Unicode supplementary planecharsand surrogate pairs - OSX edition

Ian MacArthur Sun, 17 Apr 2011 13:24:25 -0700

On 17 Apr 2011, at 17:42, Duncan Gibson wrote:
> 
> Ian:
>> Were they? I don't recall that.
>>


>> So I'm only trying to make things consistent.
> 
> See "Unicode in FLTK" section http://www.fltk.org/doc-1.3/unicode.html
> but it could be that I documented this based on comments in the code,
> both FLTK-1.3.x and FLTK-2.x and not just on discussion in the forums.


Ah right - OK, I didn't know that was there... Hmm, well, it says:

The current implementation of Unicode / UTF-8 in FLTK will impose the following 
limitations:

        • An implementation note in the [OksiD] code says that all functions 
are LIMITED to 24 bit Unicode values, but also says that only 16 bits are 
really used under linux and win32. [Can we verify this?]

(response)
Saying we are "LIMITED to 24-bits" is misleading here; the full Unicode range, 
with the basic plane and all the supplementary planes is *by definition* only 
using 24-bits, so that's not really a limitation we have introduced.

The linux code seems largely to use 32-bit chars anyway, and in my app code 
that exercises the supplementary planes, I've not seen any problems so far.

Both WinXX and OSX use UTF16 in their native representations, but both are 
(more or less) capable of handling surrogates. OSX does, and with Manolo's 
fixes I think we do too.

Under Windows it is a bit more hairy - some API's do handle surrogates, others 
don't (e.g. many GDI functions only handle the BMP and no surrogates.)
However, under the WIN32 API there are usually a dozen ways of doing anything, 
so with a little fishing it is usually possible to find a function that does 
what you want *and* can handle the surrogates.
I have one more "bug" I know about - measuring text extents of surrogate pairs 
- and I think we are done. Possibly. I have a workaround that I have not 
commited, that appears to be reasonably correct in all my tests.

I think we ought to try and cover the surrogates, because the code we had 
*nearly* did anyway and in some places thought it was when it wasn't! Also 
there are places where the comments suggest it did not when it does...



        • The [fltk2] fl_utf8encode() and fl_utf8decode() functions are 
designed to handle Unicode characters in the range U+000000 to U+10FFFF 
inclusive, which covers all UTF-16 characters, as specified in RFC 3629. Note 
that the user must first convert UTF-16 surrogate pairs to UCS.


(response)
AFAIK fltk-2 and fltk-1.3 use the same methods now; I imported the fltk-2 
methods when I merged OksiD's code into the 1.1.8 tree, in an attempt at 
commonality. 



        • FLTK will only handle single characters, so composed characters 
consisting of a base character and floating accent characters will be treated 
as multiple characters;

(response)
Yup; composed glyphs, like ligation and so forth, is hard... Relatively easy 
for LGC languages, but scary hard for Indic or Semitic languages...


        • FLTK will only compare or sort strings on a byte by byte basis and 
not on a general Unicode character basis;

(response)
Yup - Greg suggests iconv, that might help. Also PanGo and so on, of course... 
Not for us though!


        • FLTK will not handle right-to-left or bi-directional text;

(response)
Yup - though there are RTL text widgets in the fltk code now, they only handle 
the rendering of precomposed strings, but they do work, and do render in the 
RTL order.
Bi-dir is harder, though if the string is adequately pre-composed that should 
Just Work (for certain values of work, naturally.)



_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev

Re: [fltk.development] More fun with Unicode supplementary planecharsand surrogate pairs - OSX edition

Reply via email to