> Well I was hoping to be able to keep MK's code as vanilla as possible. > If we do need to change it, and it now looks like we do, we should > still make as few changes we can. I shall investigate this evening.
Indeed so - though my change is simply to s/wchar_t/unsigned long/g and that's the whole edit. The code then works correctly on all target platforms, AFAICT. Given the choice, I'd also elide the *_cjk() backward compatability functions as we should not be using them, though their presence probably does no harm. > 1. I had assumed that wchar_t was fixed, or what is the point of it, > but then wchar_t is more of a C/POSIX thing than general Unicode: > see http://userguide.icu-project.org/posix > and http://icu-project.org/docs/papers/unicode_wchar_t.html Yup. The whole wide character thing is a mess. It mainly predates Unicode of course, and for example the MS version is 16-bits because that's what they used for their old, pre-Unicode, wide character support, e.g. for the far-eatern releases of Windows and etc. > 2. Although UTF-8 encoding allows for up to 32 bits per character by > using a 6-byte encoding, both the Unicode Consortium and ISO have > decided that they don't need that full range any more. IIRC they > only need 21 bits to represent all characters. So this is probably > the reason for the statement somewhere in the FLTK code that it > only handles 24 bits. Indeed so - though a 32-bit type is simplest to handle (internally), and means that the value can be the same as the Unicode code point... But then endianess becomes an issue at interfaces - hence the need for utf8, which is immune to endianess. Of course, with larger types (16 or 32 bit) we can use the BOM to identify the endian ordering of the text, but that is such a bodge... > [It also says that only 16 bits are really needed for Linux and > Windows, which fits with a limited 16-bit wchar_t implementation] Hmm, not convinced this is true - it is not uncommon to see utf16 text with surrogate pairs in it (where the required code point does not fit in a utf16 entry and is split over 2 16-bit values) so that kind of implies that a 16-bit only implementation isn't going to work for us... SELEX Galileo Ltd Registered Office: Sigma House, Christopher Martin Road, Basildon, Essex SS14 3EL A company registered in England & Wales. Company no. 02426132 ******************************************************************** This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ******************************************************************** _______________________________________________ fltk-dev mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk-dev
