Hi, Mr. Sam,
I think GNU libiconv is a better choice than you maintain a
Unicode library yourself. Libiconv's maintainers are more professional
to trace The Unicode Consortium.
Actually, it is oriental people, who speak large character set
languages, that has much more eager requirement for Unicode support
than western people, most of whose languages can be expressed in
256-glyph character sets.
But at the same time, the maintenance of large character sets
such as Chinese (GB18030, BIG5-HKSCS), Japanese (Shift-JIS) is a piece
of tiring work. The constitutors of these encodings,
Chinese/Japanese/Korean governments and other organizations, are
modifying these encoding standard continually, according to
Chinese/Japanese/Korean people's writing fashions.
You said to me that GNU libiconv cannot provide meta data that
Courier requires.
But I think there is a workaround with GNU libiconv:
Assume a byte string: [b1 b2 b3 b4 b5 ... bn] (ended with a CR/LF)
1. Initialize GNU Libiconv: iconv_open("UCS-4BE", "SOME ENCODING");
2. Try iconv() against: [b1]
If successfully, the current character is [b1], skip [b1] and continue
from step 2.
3. Try iconv() against: [b1 b2]
If successfully, the current character is [b1 b2], skip [b1 b2] and
from step 2.
4. Try iconv() against: [b1 b2 b3]
If successfully, the current character is [b1 b2 b3], skip [b1 b2 b3]
and continue from step 2.
5. Try iconv() against: [b1 b2 b3 b4]
If successfully, the current character is [b1 b2 b3 b4], skip
[b1 b2 b3 b4] and continue from step 2.
6. Try iconv() against: [b1 b2 b3 b4 b5]
If successfully, the current character is [b1 b2 b3 b4 b5], skip
[b1 b2 b3 b4 b5] and continue from step 2.
7. Try iconv() against: [b1 b2 b3 b4 b5 b6]
If successfully, the current character is [b1 b2 b3 b4 b5 b6], skip
[b1 b2 b3 b4 b5 b6] and continue from step 2.
8. Output "?" as a dummy substitution, Skip [b1], and continue from step 2.
Of course, some optimization measures can be applied to the above
workaround.
Only trials of [b1] and [b1 b2] is needed for GB2312, GBK, BIG5,
BIG5-HKSCS, EUC-JP and Shift-JIS.
GB18030 requires [b1], [b1 b2] and [b1 b2 b3 b4].
UTF-8 requires [b1], [b1 b2], [b1 b2 b3], [b1 b2 b3 b4],
[b1 b2 b3 b4 b5] and [b1 b2 b3 b4 b5 b6].
------------------------------------------------------------------------
From Beijing, China
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
courier-users mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users