Friends

(At least in the case of...) When a message is sent via the http
interface to smsbox smsbox_req_handle calls charset_processing to
translate the text.

If the coding is UCS2 and the charset is not UTF-16BE it is converted
to utf8, thence to UTF-16BE.

So far so good.

But I have some questions about the charset_to_utf8.  

charset_to_utf8 uses libxml2 to do the grunt work of translating the
encoding (xmlFindCharEncodingHandler and xmlCharEncInFunc).  I have
been looking at...

http://xmlsoft.org/html/libxml-encoding.html#XMLCHARENCODING

This has lists a few charecter codings.  (I have listed them via cut
and paste below).  My email archives have emails in quite a few
different encodings.  What about them?

The char types I have found in my email archive...

         BIG5
         EUC-KR
         GB2312
         GB2312_CHARSET
         ISO-10646
         ISO-2022-JP
         ISO-8859-1
         ISO-8859-2
         ISO-8859-4
         ISO-8859-7
         ISO-8859-9;
         KOI8-R
         UNKNOWN-8BIT
         US-ASCII
         UTF-8
         Windows-1250
         Windows-1251
         Windows-1252
         X-UNKNOWN
         big5
         euc-kr
         gb2312
         iso-2022-kr
         iso-8859-1
         iso-8859-1
         iso-8859-13
         iso-8859-15
         koi8-r
         ks_c_5601-1987
         unknown-8bit
         windows-1256
         x-user-defined

The chartypes listed on http://xmlsoft.org/html/libxml-encoding.html


         XML_CHAR_ENCODING_ERROR=   -1, /* No char encoding detected */
         XML_CHAR_ENCODING_NONE=        0, /* No char encoding detected */
         XML_CHAR_ENCODING_UTF8=        1, /* UTF-8 */
         XML_CHAR_ENCODING_UTF16LE=     2, /* UTF-16 little endian */
         XML_CHAR_ENCODING_UTF16BE=     3, /* UTF-16 big endian */
         XML_CHAR_ENCODING_UCS4LE=      4, /* UCS-4 little endian */
         XML_CHAR_ENCODING_UCS4BE=      5, /* UCS-4 big endian */
         XML_CHAR_ENCODING_EBCDIC=      6, /* EBCDIC uh! */
         XML_CHAR_ENCODING_UCS4_2143=7, /* UCS-4 unusual ordering */
         XML_CHAR_ENCODING_UCS4_3412=8, /* UCS-4 unusual ordering */
         XML_CHAR_ENCODING_UCS2=        9, /* UCS-2 */
         XML_CHAR_ENCODING_8859_1=      10,/* ISO-8859-1 ISO Latin 1 */
         XML_CHAR_ENCODING_8859_2=      11,/* ISO-8859-2 ISO Latin 2 */
         XML_CHAR_ENCODING_8859_3=      12,/* ISO-8859-3 */
         XML_CHAR_ENCODING_8859_4=      13,/* ISO-8859-4 */
         XML_CHAR_ENCODING_8859_5=      14,/* ISO-8859-5 */
         XML_CHAR_ENCODING_8859_6=      15,/* ISO-8859-6 */
         XML_CHAR_ENCODING_8859_7=      16,/* ISO-8859-7 */
         XML_CHAR_ENCODING_8859_8=      17,/* ISO-8859-8 */
         XML_CHAR_ENCODING_8859_9=      18,/* [ISO-8859-9 */
         XML_CHAR_ENCODING_2022_JP=  19,/* ISO-2022-JP */
         XML_CHAR_ENCODING_SHIFT_JIS=20,/* Shift_JIS */
         XML_CHAR_ENCODING_EUC_JP=   21,/* EUC-JP */
         XML_CHAR_ENCODING_ASCII=    22 /* pure ASCII */

Worik

-- 
                                                      Worik Macky Turei Stanton
Bullseye!!                                                   [EMAIL PROTECTED]
                                                                       Aotearoa


Reply via email to