On 28 Mar 2009, at 17:13, Duncan Gibson wrote:
> I've just had a look at some of the code in Fl_Text_Display.cxx.
> The first thing that struck me was just how much of the low-level
> utf-8 implementation details are exposed.
Yes, I agree, and I apologise... a lot of that code was picked up
verbatim from the OkisD port, then I added a bit more. I ought to
have cleaned it up...
> Someone unfamiliar with
> the bit-twiddling can probably make some sense of the utf_len()
> function, but there are lots of other places in the code that
> contain things like:
>
> if (!((c & 0x80) && !(c & 0x40))) {
> ok = 1;
> } else {
> ...
> }
Is that really what it says? I'm not even sure what that is testing
for, now I look at it.
0x40 to 0x7F (matched by the second part of that test) matches the
part of the basic Latin plane that is essentially just non-numeric
ASCII, but including some punctuation.
0x80 being true means the character is possibly multi-byte and needs
to be checked.
However: Note that in the Lation-1 supplemtal page, 0x80 to 0x9F map
to control characters, which is a bit of a nuisance because in many
common pre-Unicode code-pages (e.g. cp125x pages) these values map to
commonly used supplemental characters (e.g. the Euro sign and many
other "everyday" glyphs often lie in that range in many code pages...)
So, I'm not quite sure what hat test is going for, in summary... I
ought to look at it in context I guess.
> I suggest that one of the first passes through the code adds some
> is_xyz(c) functions (or macros is speed is really a factor) so that
> mere mortals like me have a chance of understanding the code.
This seems a very (very) good idea.
Cheers,
--
Ian
_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev