Re: Standardized encoding names for iconv_open()

Bruno Haible Wed, 19 May 2004 10:21:48 -0700

Markus Kuhn wrote:
> In general, the POSIX definition of iconv_open() would become *much*
> more useful, if it actually specified a couple of encoding strings, and
> what exactly they mean.


I second that. JAVA has a similar "minimal supported set of encodings"
in its conversion facility.

>   ""                   multi-byte encoding of current LC_CTYPE locale
>   "UTF-8"              UTF-8 (with overlong sequences being illegal)
>   "UTF-16"             UTF-16 (same byte order as C's short)
>   "UTF-16BE"           UTF-16 BigEndian
>   "UTF-16LE"           UTF-16 LittleEndian
>   "UTF-32"             UTF-32 (same byte order as C's long)
>   ...

"UTF-16" and "UTF-32" are defined differently than "same byte order as
C's short", in RFC 2781. It's better to refer to their lengthy definition
in RFC 2781.

> and perhaps even
>
>   "EUC-JP", "EUC-KR", "EUC-TW", "GB18030"

I don't think there is a normative, widely used definition of EUC-TW.
And for GB18030, the fact that its official definition is in Chinese,
not English, doesn't prevent different implementations by different vendors.

Bruno


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Standardized encoding names for iconv_open()

Reply via email to