Friends (At least in the case of...) When a message is sent via the http interface to smsbox smsbox_req_handle calls charset_processing to translate the text.
If the coding is UCS2 and the charset is not UTF-16BE it is converted to utf8, thence to UTF-16BE. So far so good. But I have some questions about the charset_to_utf8. charset_to_utf8 uses libxml2 to do the grunt work of translating the encoding (xmlFindCharEncodingHandler and xmlCharEncInFunc). I have been looking at... http://xmlsoft.org/html/libxml-encoding.html#XMLCHARENCODING This has lists a few charecter codings. (I have listed them via cut and paste below). My email archives have emails in quite a few different encodings. What about them? The char types I have found in my email archive... BIG5 EUC-KR GB2312 GB2312_CHARSET ISO-10646 ISO-2022-JP ISO-8859-1 ISO-8859-2 ISO-8859-4 ISO-8859-7 ISO-8859-9; KOI8-R UNKNOWN-8BIT US-ASCII UTF-8 Windows-1250 Windows-1251 Windows-1252 X-UNKNOWN big5 euc-kr gb2312 iso-2022-kr iso-8859-1 iso-8859-1 iso-8859-13 iso-8859-15 koi8-r ks_c_5601-1987 unknown-8bit windows-1256 x-user-defined The chartypes listed on http://xmlsoft.org/html/libxml-encoding.html XML_CHAR_ENCODING_ERROR= -1, /* No char encoding detected */ XML_CHAR_ENCODING_NONE= 0, /* No char encoding detected */ XML_CHAR_ENCODING_UTF8= 1, /* UTF-8 */ XML_CHAR_ENCODING_UTF16LE= 2, /* UTF-16 little endian */ XML_CHAR_ENCODING_UTF16BE= 3, /* UTF-16 big endian */ XML_CHAR_ENCODING_UCS4LE= 4, /* UCS-4 little endian */ XML_CHAR_ENCODING_UCS4BE= 5, /* UCS-4 big endian */ XML_CHAR_ENCODING_EBCDIC= 6, /* EBCDIC uh! */ XML_CHAR_ENCODING_UCS4_2143=7, /* UCS-4 unusual ordering */ XML_CHAR_ENCODING_UCS4_3412=8, /* UCS-4 unusual ordering */ XML_CHAR_ENCODING_UCS2= 9, /* UCS-2 */ XML_CHAR_ENCODING_8859_1= 10,/* ISO-8859-1 ISO Latin 1 */ XML_CHAR_ENCODING_8859_2= 11,/* ISO-8859-2 ISO Latin 2 */ XML_CHAR_ENCODING_8859_3= 12,/* ISO-8859-3 */ XML_CHAR_ENCODING_8859_4= 13,/* ISO-8859-4 */ XML_CHAR_ENCODING_8859_5= 14,/* ISO-8859-5 */ XML_CHAR_ENCODING_8859_6= 15,/* ISO-8859-6 */ XML_CHAR_ENCODING_8859_7= 16,/* ISO-8859-7 */ XML_CHAR_ENCODING_8859_8= 17,/* ISO-8859-8 */ XML_CHAR_ENCODING_8859_9= 18,/* [ISO-8859-9 */ XML_CHAR_ENCODING_2022_JP= 19,/* ISO-2022-JP */ XML_CHAR_ENCODING_SHIFT_JIS=20,/* Shift_JIS */ XML_CHAR_ENCODING_EUC_JP= 21,/* EUC-JP */ XML_CHAR_ENCODING_ASCII= 22 /* pure ASCII */ Worik -- Worik Macky Turei Stanton Bullseye!! [EMAIL PROTECTED] Aotearoa
