Re: [courier-users] The Possibility to Substitute GNU Libiconv for Your Unicode Library

mag Sat, 27 May 2006 07:35:16 -0700

    Iconv is a simple and stream-oriented API, conforming to UNIX 98.
Perhaps subscribers of those mailing lists considered iconv to be too
simple. I have got known of only 4 manual pages about GNU libiconv:


/usr/local/man/man1/iconv.1.gz
/usr/local/man/man3/iconv.3.gz
/usr/local/man/man3/iconv_open.3.gz
/usr/local/man/man3/iconv_close.3.gz

    Well, meta data are courier-specific and need to be reserved and
developped. But substantial encoding conversion can be afforded by
GNU libiconv, can't it?

    I would write some relevant code.

------------------------------------------------------------------------
                                               From Beijing, China

Sam Varshavchik wrote:

[EMAIL PROTECTED] writes:
     Are these just "meta data" you referred to?
Yes. The unicode library in Courier does not just convert stuff from onecharacter set to another. I also need to know some metadata about eachcharacter set, such as what I listed below.
When, for example, encoding the character set in a message's header orbody, I need to know whether the character set uses shift-in/shift-outcharacter sequences, if so base64 must be used for encoding the characterset in the headers. Even in character sets that don't useshift-in/shift-out sequences, I still need to know the preferred encodingmethod, in order to automatically select the best one when encodingmessage content.
I remember that many years ago I sent a mail to whatever mailing listaddress I dug up out of iconv's documentation. My mail was ignored.
     To maintain oriental languages' encoding conversion tables is
a piece of hard work. For example, your GB2312 table includes only 6763
Chinese characters. But our MANDATORY new national standard GB18030
covers 27484 Chinese characters! If we only use GB2312, even we cannot
spell our ex-prime minister's name (Rong-Ji Zhu), and we cannot print
all contents of most of Chinese classical novels.

     Except meta data, it is wiser to make use of substantial conversion
tables provided by other professional libraries.

     If you agree with me, I and others will help you in oriental
languages. Western language encodings (ISO 8859-X, KOI-8, IBM/Microsoft)
are much simpler than CJK, easy to be solved.

------------------------------------------------------------------------
                                                From Beijing, China

Sam Varshavchik wrote:
Ysbeer writes:
Sam,

Out of curiosity, have you ever considered using ICU for handling your
Unicode requirements?
I am not familiar with ICU's capabilities. The requirements are thatfor a given character set, I must know whether or not:
1) The character set's lower 128 bytes consist of US-ASCII
2) The character set is a direct mapping of unicode (UTF-8, UTF-7, etal)
3) Whether the character set uses multibyte characters
4) The character set uses composite mapping using shift-in/shift-outescape codes
5) Unrepresentable unicode characters may be ignored when convertingunicode to/from the character set
6) Whether quoted-printable or base-64 is best for encoding thecharacter set in the message's headers or body.

_______________________________________________
courier-users mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Re: [courier-users] The Possibility to Substitute GNU Libiconv for Your Unicode Library

Reply via email to