On 21.09.2011, at 14:50, Nikita Egorov wrote: >> Without looking into it in more detail, I'm going to say docs problem - >> it should not say "n characters" but should probably say something like >> "the number of bytes needed to represent n characters in UTF8" or some >> such thing... Or...? > Yes, description should be replaced to "number of bytes", but I want > to work with characters.
The docs are wrong. fl_draw is a low level function and should not do things like character counting. It would be your responsibility to convert the number of characters into the number of bytes. There are functions for that in fltk_utf... . It also makes sense, because a string that seems identical can be represented in various ways (for example, an umlaut can be composed from a " and the letter U, or it can use the umlaut glyph - both is perfectly legal). And only the caller can know how fl_draw is used. So, "bytes" would be correct. >>> To solve the problem we can convert all string to UTF-16 and >>> then restrict its by the specified length. >> >> Which perhaps will not work either, since as soon as you hit any >> character that is not on the BMP you will need a UTF16 surrogate pair, >> and the same sort of problem occurs. > > But in UTF-16 all symbols have size two bytes. There is no problem to > set specified size of string as opposed to UTF8 where every symbol can > have own size (from 1 up to 5?) . No, in UTF-16, characters can be composed as well, plus all characters above 0x7fff (IIRC) are represented by a four byte sequence, just like characters above 0x7f in UTF-8 are represented by longer sequences. Most Chinese characters will need four bytes in UTF-16, for example. >> Indeed - the reason for having the "length" option in these methods is >> explicitly so that the string does not need to be NULL terminated... > > It is one more issue - the current implementation ignores NULL byte. > It prints exactly N bytes from the source. Such behavior makes some > problems for me too. May be we should check for zero char and stop > printing the rest? No. This function was written particularly so that a NUL can be printed. This may (or may not) be useful for certain terminal style text displays, who knows. But I do know that there was a reason to do it - back then. >> Not keen, if we are to have a clean UTF8 API (and I think we should) >> then... > > What is the harm if we add one more function which accepts UTF16 > string ? In MS Win it would be part of the gd->draw(...) which would > invoked after conversion. It was a conscious and consensual decision to go with UTF-8. If we add a single UTF-16 call, we will need to provide more and more calls, and we will need to provide support calls for platforms that do not use UTF-16 (All Unixes, for example). This is certainly possible, but who will maintain the code? AFAIK there are two functions to convert to and from UTF-16 which you can use if you prefer UTF-16 when coding. Internally however, FLTK uses UTF-8 (even if it has to convert back to UTF-16 to print text). >>> Because in order to evaluate right length of >>> text, user has to convert source line to the UTF-16, restrict >>> size and convert into UTF-8 again to invoke fl_draw(s,x,y) >>> where string will be converted one more time. Thus at the >>> moment there is two unnecessary conversions. >> >> Hmm, we have functions that tell you the number of "characters" in a >> UTF8 string, do these not help? > > I see no way to use it. Shortly - I have to print column of text lines > restricted by specified length. I use the Courier font. > The lines can contain any symbols - latin, cyrillic etc. I can't > evaluate needed size of line in bytes if I use UTF8. Only via > conversion into any accessible format with monosized symbols. No, even if you use monospace fonts, you can not assume that the number of characters times the width of the font will give you the width of the string that will be rendered on screen. There are characters and character combinations in Unicode that need more or less pixels, even in monospaced fonts! There are even character sequences that have different width in different combinations, so simply adding up the width on each individual character will not work. The only reliable way to get the width of whatever is printed is using fl_width() after setting the font and size. Trust me, I know all that because I converted Fl_Text_Editor from ASCII to UTF-8, and doing this, I learned much more about Unicode than I ever wanted to know. >> Anyway, best post an STR, and maybe a simple example we can use for >> testing. > > It's no problem, but... my latest STR (fl_draw_image_mono(...) has no > body at all!) hang there without any interest FLTK is an Open Source effort. We are really trying to keep the ball rolling, but we all have regular jobs and some of us even regular kids ;-) . Please be patient. Our first goal is to keep FLTK1 as stable and as bug-free as possible. - Matthias _______________________________________________ fltk-dev mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk-dev
