Iconv is a simple and stream-oriented API, conforming to UNIX 98.
Perhaps subscribers of those mailing lists considered iconv to be too
simple. I have got known of only 4 manual pages about GNU libiconv:

/usr/local/man/man1/iconv.1.gz
/usr/local/man/man3/iconv.3.gz
/usr/local/man/man3/iconv_open.3.gz
/usr/local/man/man3/iconv_close.3.gz

    Well, meta data are courier-specific and need to be reserved and
developped. But substantial encoding conversion can be afforded by
GNU libiconv, can't it?

    I would write some relevant code.

------------------------------------------------------------------------
                                               From Beijing, China

Sam Varshavchik wrote:

[EMAIL PROTECTED] writes:

     Are these just "meta data" you referred to?

Yes. The unicode library in Courier does not just convert stuff from one character set to another. I also need to know some metadata about each character set, such as what I listed below.

When, for example, encoding the character set in a message's header or body, I need to know whether the character set uses shift-in/shift-out character sequences, if so base64 must be used for encoding the character set in the headers. Even in character sets that don't use shift-in/shift-out sequences, I still need to know the preferred encoding method, in order to automatically select the best one when encoding message content.

I remember that many years ago I sent a mail to whatever mailing list address I dug up out of iconv's documentation. My mail was ignored.



     To maintain oriental languages' encoding conversion tables is
a piece of hard work. For example, your GB2312 table includes only 6763
Chinese characters. But our MANDATORY new national standard GB18030
covers 27484 Chinese characters! If we only use GB2312, even we cannot
spell our ex-prime minister's name (Rong-Ji Zhu), and we cannot print
all contents of most of Chinese classical novels.

     Except meta data, it is wiser to make use of substantial conversion
tables provided by other professional libraries.

     If you agree with me, I and others will help you in oriental
languages. Western language encodings (ISO 8859-X, KOI-8, IBM/Microsoft)
are much simpler than CJK, easy to be solved.

------------------------------------------------------------------------
                                                From Beijing, China

Sam Varshavchik wrote:

Ysbeer writes:

Sam,

Out of curiosity, have you ever considered using ICU for handling your
Unicode requirements?

I am not familiar with ICU's capabilities. The requirements are that for a given character set, I must know whether or not:

1) The character set's lower 128 bytes consist of US-ASCII

2) The character set is a direct mapping of unicode (UTF-8, UTF-7, et al)

3) Whether the character set uses multibyte characters

4) The character set uses composite mapping using shift-in/shift-out escape codes

5) Unrepresentable unicode characters may be ignored when converting unicode to/from the character set

6) Whether quoted-printable or base-64 is best for encoding the character set in the message's headers or body.









_______________________________________________
courier-users mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Reply via email to