DO NOT REPLY TO THIS MESSAGE.  INSTEAD, POST ANY RESPONSES TO THE LINK BELOW.

[STR New]

Link: http://www.fltk.org/str.php?L2158
Version: 1.3-current


I started to look at the wrap_mode(1,0) problem, and I think it will
entail a lot more changes along the lines of what I did before. I see
from the FLTK2 code that they work through the buffer one byte at a
time, handle that byte, and then adjust the offset if that byte is the
start of a UTF-8 sequence. On the one hand that's cleaner, but on the
other it still doesn't address the column width problem because that
requires knowing which UCS character we are dealing with.

Therefore I either need to provide a whole series of additional routines
that take a char* rather than a char, as I did above, and let each level
extract the full UTF-8 byte sequence as needed, or I check for the UTF-8
sequence at the top level and then pass the UCS value rather than char
to a series of new routines. Still haven't made up my mind on this one.

On a related note, let's go back to wcwidth() and mk_wcwidth().

The Linux man page for wcwidth() says that it returns 0 for U+0000,
the number of columns needed for printable wide characters, and -1
for non-printable characters. The behaviour also depends on LC_CTYPE.

Markus Kuhn's implementation returns 0 for U+0000, only standard(?)
control characters and DEL return -1, and all other return 0, 1, 2.
There is no reference to locale specifics like LC_CTYPE in the code.

If I build Markus Kuhn's implementation into FLTK-1.3.x xutf8 code
and run the following:

#include <wchar.h>
#include <FL/Fl.H>
#include <FL/fl_utf8.h>

int main(int argc, char *argv[]) {
  for (wchar_t ucs = 0; ucs < 0xFFFF; ucs++) {
    int w1 = wcwidth(ucs);
    int w2 = mk_wcwidth(ucs);
    if (w1 != w2)
      printf("U+%04x: wcwidth()=%2d, mk_wcwidth()=%2d\n", ucs, w1, w2);
  }
  return 0;
}

I can see that there are a lot of characters that return -1 for the
standard wcwidth(), but do return 0, 1, and 2 for mk_wcwidth(). I don't
know whether the first is due to the limited number of locales that I
have installed on this box or not.

Although I have the feeling that providing a platform and locale(?)
neutral implementation has its advantages, I wonder whether it might
cause problems where wcwidth() is already being used elsewhere in the
system. For example, where a system editor and the FLTK app show two
different views of the same file. Should we be worried?

And finally, as far as I can see, there's no reason why we can't just
add Markus Kuhn's wcwidth.c to the xutf8 directory provided we keep
the copyright etc. I don't think there's even a need to edit the file.
Therefore, I have it all ready to be committed in the next few days.

Any comments?


Link: http://www.fltk.org/str.php?L2158
Version: 1.3-current

_______________________________________________
fltk-bugs mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-bugs

Reply via email to