On 28 Mar 2009, at 17:13, Duncan Gibson wrote:

> I've just had a look at some of the code in Fl_Text_Display.cxx.
> The first thing that struck me was just how much of the low-level
> utf-8 implementation details are exposed.

Yes, I agree, and I apologise... a lot of that code was picked up  
verbatim from the OkisD port, then I added a bit more. I ought to  
have cleaned it up...

> Someone unfamiliar with
> the bit-twiddling can probably make some sense of the utf_len()
> function, but there are lots of other places in the code that
> contain things like:
>
>     if (!((c & 0x80) && !(c & 0x40))) {
>         ok = 1;
>     } else {
>         ...
>     }

Is that really what it says? I'm not even sure what that is testing  
for, now I look at it.
0x40 to 0x7F (matched by the second part of that test) matches the  
part of the basic Latin plane that is essentially just non-numeric  
ASCII, but including some punctuation.
0x80 being true means the character is possibly multi-byte and needs  
to be checked.
However: Note that in the Lation-1 supplemtal page, 0x80 to 0x9F map  
to control characters, which is a bit of a nuisance because in many  
common pre-Unicode code-pages (e.g. cp125x pages) these values map to  
commonly used supplemental characters (e.g. the Euro sign and many  
other "everyday" glyphs often lie in that range in many code pages...)
So, I'm not quite sure what hat test is going for, in summary... I  
ought to look at it in context I guess.

> I suggest that one of the first passes through the code adds some
> is_xyz(c) functions (or macros is speed is really a factor) so that
> mere mortals like me have a chance of understanding the code.

This seems a very (very) good idea.


Cheers,
-- 
Ian

_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev

Reply via email to