xutf8

Duncan Gibson Tue, 20 Apr 2010 02:25:07 -0700

> Well... I still wonder if we need "src/xutf8/fl_wcwidth.c" at all?
>
> My preference would be to put that functionality into the existing
> file "src/fl_utf.c", which is kind of where I think it belongs
> anyway...


I kept it separate for testing purposes. fl_wcwidth() can certainly
be moved into src/fl_utf.c, and would avoid changes to Makefiles.

> In particular, that would mean that the fl_wcwidth() function would
> be able to take account of the macros (defined in "src/fl_utf.c")
> that define our utf8 error handling policy, in particular
> ERRORS_TO_CP1252 or STRICT_RFC3629.
>
> If we have ERRORS_TO_CP1252 set (it normally *is* set) in
> "src/fl_utf.c" then for the code points in the 0x80 to 0x9F range
> the fl_wcwidth() function would need to return (+1) whereas at
> present it will return (-1), which will not be what we want...
> Indications are that there's *a lot* of text out there that claims
> to be utf8 but actually uses that C1 control chars range (0x80 to
> 0x9F) as per the CP1252 characters, so we probably do want to do this.
>
> Thoughts?

Well, so far, for solving the STR-2158 problems at least, I had only
identified one place where fl_wcwidth() would be needed, and seeing
as you have to call fl_utf8decode() to get the ucs value to pass to 
fl_wcwidth(), the extra testing is already done in fl_utf8decode().
[Replace mk_wcwidth() by fl_wcwidth() in the current code below.]
So as it stands now, the extra code is not needed in fl_wcwidth().


int Fl_Text_Buffer::character_width(const char *src, int indent, int tabDist)
{
  char c = *src;
  if ((c & 0x80) && (c & 0x40)) {       // first byte of UTF-8 sequence
    int len = fl_utf8len(c);
    int ret = 0;
    unsigned int ucs = fl_utf8decode(src, src+len, &ret);
    int width = 1; //   mk_wcwidth((wchar_t)ucs); // FIXME
    return width;
  }
  if ((c & 0x80) && !(c & 0x40)) {      // other byte of UTF-8 sequence
    return 0;
  }
  return character_width(c, indent, tabDist);
}

[Hmm. Now I look at it, it probably makes sense to simplify this code
further by also providing an "fl_wcwidth(const char* src)" version
which does the decoding too.]

So, yes, we could add the extra code, just in case someone is working
with an array of UCS values, rather than UTF-8 encoded strings, but
that brings us back to yesterday's observation. FLTK should concern
itself only with teh display of UTF-8 characters, and let the user
implement UCS manipulation in icu4c or pango or whatever.
At least for FLTK-1.3...

Cheers
Duncan
_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev

Re: [fltk.development] [Library] r7536-branches/branch-1.3/src/xutf8

Reply via email to