Re: Expat XML Parser Full Character Encoding Support

Bruno Haible Tue, 21 Jan 2003 12:43:12 -0800

Michael B. Allen writes:

> So the first column
> is a big endian representation of the multibyte sequence corresponding
> to the UCS code in the right column? So I could generate the maps from
> that information and use the libiconv *_mbtowc functions to do multibyte
> conversions.


Yes.

> Incidentally why is there no ISO-2022-JP.TXT?

ISO-2022-JP can not be described by such a table. It's a stateful
encoding.

Even with an expat that understands other encodings than UTF-8 and
ISO-8859-1, people should continue using UTF-8 for their XML files.
Quoting from http://www.w3.org/TR/charmod/ :

  "When specifications choose to allow encodings other than Unicode
   encodings, implementers should be aware that the correspondence
   between the characters of a legacy encoding and Unicode characters
   may in practice depend on the software used for transcoding. See the
   Japanese XML Profile [http://www.w3.org/TR/japanese-xml/] for
   examples of such inconsistencies."

Bruno

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Expat XML Parser Full Character Encoding Support

Reply via email to