Me: >> 2. Although UTF-8 encoding allows for up to 32 bits per character by >> using a 6-byte encoding, both the Unicode Consortium and ISO have >> decided that they don't need that full range any more. IIRC they >> only need 21 bits to represent all characters. So this is probably >> the reason for the statement somewhere in the FLTK code that it >> only handles 24 bits.
Ian: > Indeed so - though a 32-bit type is simplest to handle (internally), > and means that the value can be the same as the Unicode code point... > > But then endianess becomes an issue at interfaces - hence the need > for utf8, which is immune to endianess. Of course, with larger types > (16 or 32 bit) we can use the BOM to identify the endian ordering of > the text, but that is such a bodge... >> [It also says that only 16 bits are really needed for Linux and >> Windows, which fits with a limited 16-bit wchar_t implementation] > Hmm, not convinced this is true - it is not uncommon to see utf16 > text with surrogate pairs in it (where the required code point does > not fit in a utf16 entry and is split over 2 16-bit values) so that > kind of implies that a 16-bit only implementation isn't going to > work for us... The [text] above is based on some comments in the code, so I assume that Roman or O'ksi'D or Bill or someone had some insight/analysis to back this up. I don't have the multi-language / script experience to be able to judge. As far as I can see, FLTK only needs to concentrate on how to display UTF-8 characters at the moment. Anyone who is manipulating text with composing characters, surrogates, bi-directional text, etc. should really be using some other library, such as icu4c, for the bulk of the work. Again, I have no experience of icu4c - I was just reading the web pages - so have no idea if better alternatives are availale, or if they are fast and light enough for FLTK to link to them. Maybe that's an RFE for 1.4 or 3.1... Cheers D. _______________________________________________ fltk-dev mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk-dev
