>> The ones that I see most commonly are: US-ASCII, UTF-8, ISO-8859-1, >> ISO-8859-2, ISO-8859-15, KOI8-R, ISO-2022-JP, GB2312, BIG5, EUC-KR, >> WINDOWS-1251. However, others do appear.
If you're intending to target People's Republic of China, you MUST (by Chinese law) support GB18030. And I think it's worth noting that many (perhaps most) people in Asia are not happy with Unicode (including UTF-8!) because of the "Han unification" effect. The basic problem is that Japanese, Chinese, and Korean all use a large number of the same "characters" and when mapping to Unicode these characters "lose" their language making it difficult to pick an appropriate font. Chinese characters CAN be displayed more or less intelligibly with a Japanese font (and vice versa), but to a Chinese person the result "looks" Japanese (and vice versa). Although "only a font problem", this is a problem interfering with the acceptance of Unicode (it's a cultural identity issue and, I think, will not be easily resolved). I think the bottom line is that if you do any mappings from any of the Asian character sets into UTF-8 you should probably remember the original character set so you can map it back (and originating e-mail in an Asian localized client using only UTF-8 is not likely to be acceptable).
-Rick Block
