[A discussion about replacing JapaneseCodecs and KoreanCodecs in Mailman
2.1.4 with CJKCodecs]
On Mon, 2003-12-29 at 03:26, Tokio Kikuchi wrote:
> Sorry again Barry.
>
> We have to keep JapaneseCodecs and KoreanCodecs in the ditribution
> and install in the pythonlib directory because email package designate
> japanese and korean as prefix of charsets. I will have to study more
> on cjkcodecs behavior (looks like japanese part has old bug in earlier
> distribution of JapaneseCodecs) so please cancel this checkin.
Oh dang.
The problem is CODEC_MAP in email/Charset.py, right?
Here's a hack for Mailman 2.1.4:
-----japanese.py
from cjkcodecs import euc-jp, iso-2022-jp, shift_jis
-----korean.py
from cjkcodecs import euc-kr, cp949, iso-2022-kr, johab
We add these two files to Mailman's pythonlib, and then the imports in
Charset.py should work correctly.
It would be nice if cjkcodecs provided backwards compatibility.
Otherwise, we probably want to provide some ourselves in
email/Charset.py. I'm not sure there's a better way to do this, but
attached is a strawman (untested) patch for email 2.5.5/Python 2.3.4.
It's too late to get this into Python 2.3.3, but if this is acceptable,
I can check this in for Python 2.3.4, and cut a new email package
tarball for Mailman 2.1.4, forgoing the above hack.
-Barry
Index: Charset.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/email/Charset.py,v
retrieving revision 1.13
diff -u -r1.13 Charset.py
--- Charset.py 6 Mar 2003 05:16:29 -0000 1.13
+++ Charset.py 29 Dec 2003 13:50:48 -0000
@@ -88,24 +88,8 @@
'ascii': 'us-ascii',
}
-# Map charsets to their Unicode codec strings. Note that Python doesn't come
-# with any Asian codecs by default. Here's where to get them:
-#
-# Japanese -- http://www.asahi-net.or.jp/~rd6t-kjym/python
-# Korean -- http://sf.net/projects/koco
-# Chinese -- http://sf.net/projects/python-codecs
-#
-# Note that these codecs have their own lifecycle and may be in varying states
-# of stability and useability.
-
+# Map charsets to their Unicode codec strings.
CODEC_MAP = {
- 'euc-jp': 'japanese.euc-jp',
- 'iso-2022-jp': 'japanese.iso-2022-jp',
- 'shift_jis': 'japanese.shift_jis',
- 'euc-kr': 'korean.euc-kr',
- 'ks_c_5601-1987': 'korean.cp949',
- 'iso-2022-kr': 'korean.iso-2022-kr',
- 'johab': 'korean.johab',
'gb2132': 'eucgb2312_cn',
'big5': 'big5_tw',
'utf-8': 'utf-8',
@@ -114,6 +98,47 @@
# Let that stuff pass through without conversion to/from Unicode.
'us-ascii': None,
}
+
+
+# Python doesn't come with any Asian codecs by default, but there are several
+# distutils packages available separately. The current preference is for the
+# combined CJKCodecs, providing Chinese, Japanese, and Korean codecs:
+#
+# CJKCodecs -- http://cjkpython.i18n.org
+#
+# Alternatively, you can download the separate Asian codecs:
+#
+# Japanese -- http://www.asahi-net.or.jp/~rd6t-kjym/python
+# Korean -- http://sf.net/projects/koco
+# Chinese -- http://sf.net/projects/python-codecs
+#
+# Note that all these codecs have their own lifecycle and may be in varying
+# states of stability and useability.
+
+try:
+ # Preference is for cjkcodecs
+ import cjkcodecs
+
+ CODEC_MAP.update({
+ 'euc-jp': 'cjkcodecs.euc-jp',
+ 'iso-2022-jp': 'cjkcodecs.iso-2022-jp',
+ 'shift_jis': 'cjkcodecs.shift_jis',
+ 'euc-kr': 'cjkcodecs.euc-kr',
+ #'ks_c_5601-1987': 'cjkcodecs.cp949',
+ 'iso-2022-kr': 'cjkcodecs.iso-2022-kr',
+ 'johab': 'cjkcodecs.johab',
+ })
+except ImportError:
+ # XXX we don't test for these fallback codecs being available
+ CODEC_MAP.update({
+ 'euc-jp': 'japanese.euc-jp',
+ 'iso-2022-jp': 'japanese.iso-2022-jp',
+ 'shift_jis': 'japanese.shift_jis',
+ 'euc-kr': 'korean.euc-kr',
+ 'ks_c_5601-1987': 'korean.cp949',
+ 'iso-2022-kr': 'korean.iso-2022-kr',
+ 'johab': 'korean.johab',
+ })
_______________________________________________
Mailman-Developers mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/mailman-developers