On 16/09/2011 00:01, Dimitri Smits wrote: > > errrm, utf-8 can have 6 octets representing one character,
Last time I checked, that was only in the very early stages of developing the utf-8 specification. Since then, the maximums size of a utf-8 code point is 4 bytes. If you know otherwise, please post a URL. Here is the information I have: "The original specification allowed for sequences of up to six bytes, covering numbers up to 31 bits (the original limit of the Universal Character Set). In November 2003 UTF-8 was restricted by RFC 3629 to four bytes covering only the range U+0000 to U+10FFFF, in order to match the constraints of the UTF-16 character encoding." http://en.wikipedia.org/wiki/UTF-8#History > not forgetting those dioretics that are separate characters. I'm representing a code point in TfpgChar. If you want the "completed character as is displayed on the screen", then simply normalize your TfpgString first, then extract the "character". Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel