Hi, I'm not sure I understood the issue.
If I understand it correctly mime4j is working good but your input string declare a wrong charset. Is this right? You say this "wrong charset" is reported by many commonly used mail clients and so you expect mime4j to have a workaround for that: right? What JVM are you using? Stefano 2011/8/2 Sharma, Ashish <[email protected]>: > Hi, > > I am trying to decode mime words (the original string is in Chinese > characters) using DecoderUtil.decodeEncodedWords(). > > Following is the sample code : > > @Test > public void testEncoding() throws UnsupportedEncodingException, > IOException{ > String str = "=?gb2312?B?ztKyu8rH1tCH+LmyrmEudHh0?="; > str = str + "\r\n "; > str = str + "=?gb2312?B?ztLKx9bQufrIyy50eHQ=?="; > str = DecoderUtil.decodeEncodedWords(str); > File file = new File("C://chinese2.txt"); > FileOutputStream fileOut = new FileOutputStream(file); > fileOut.write(str.getBytes("gb2312")); > fileOut.flush(); > fileOut.close(); > > } > > In above code the characters would seem to be corrupted. > > Here the problem is with the character set, most of the mail clients set the > char sets to be GB2312, but actually to decode the chars correctly I had to > use GB18030 in the above code. (Refer this for more info: > http://stackoverflow.com/questions/3856920/character-corruption-for-chinese-simple-and-traditional-and-korean-texts) > > Following is the generalization that I had made to replace character sets > sent by mail clients for correct decoding of characters : > > 1. For any of following Chinese char set: > > > iso-ir-58,chinese,gbk,cn-gb,csgb2312,csiso58gb231280,euc-cn,euc_cn,euccn,gb2312,gb_2312-80,x-EUC-CN,gb2312-1980,gb2312-80 > > replace it with : GB18030 > > 2. For any of the following Korean char set: > > > 5601,ksc5601-1987,ksc5601_1987,euckr,ksc5601,ksc_5601,euc_kr,csEUCKR,ks_c_5601-1987 > > replace it with :EUC-KR > > 3. for any of the following Taiwanese char set: > > ms-874\,ms874\,windows-874\,cp874\,874\,cs874\,ibm874 > > replace it with : TIS-620 > > > I suggest that in the "DecoderUtil.decodeEncodedWords()" method itself > charset fallback should be provided. > > For more info, refer http://wiki.whatwg.org/wiki/Web_Encodings also. > > Please reply your comments. > > Thanks > Ashish Sharma >
