Re: Mime word not getting decoded using mime4j

Stefano Bagnara Tue, 02 Aug 2011 02:25:20 -0700

Hi,

I'm not sure I understood the issue.


If I understand it correctly mime4j is working good but your input
string declare a wrong charset. Is this right?

You say this "wrong charset" is reported by many commonly used mail
clients and so you expect mime4j to have a workaround for that: right?

What JVM are you using?

Stefano

2011/8/2 Sharma, Ashish <[email protected]>:
> Hi,
>
> I am trying to decode mime words (the original string is in Chinese 
> characters) using DecoderUtil.decodeEncodedWords().
>
> Following is the sample code :
>
> @Test
>        public void testEncoding() throws UnsupportedEncodingException, 
> IOException{
>                String str = "=?gb2312?B?ztKyu8rH1tCH+LmyrmEudHh0?=";
>                str = str + "\r\n ";
>                str = str + "=?gb2312?B?ztLKx9bQufrIyy50eHQ=?=";
>                str = DecoderUtil.decodeEncodedWords(str);
>                File file = new File("C://chinese2.txt");
>                FileOutputStream fileOut = new FileOutputStream(file);
>                fileOut.write(str.getBytes("gb2312"));
>                fileOut.flush();
>                fileOut.close();
>
>        }
>
> In above code the characters would seem to be corrupted.
>
> Here the problem is with the character set, most of the mail clients set the 
> char sets to be GB2312, but actually to decode the chars correctly I had to 
> use GB18030 in the above code. (Refer this for more info: 
> http://stackoverflow.com/questions/3856920/character-corruption-for-chinese-simple-and-traditional-and-korean-texts)
>
> Following is the generalization that I had made to replace character sets 
> sent by mail clients for correct decoding of characters :
>
> 1. For any of following Chinese char set:
>
>        
> iso-ir-58,chinese,gbk,cn-gb,csgb2312,csiso58gb231280,euc-cn,euc_cn,euccn,gb2312,gb_2312-80,x-EUC-CN,gb2312-1980,gb2312-80
>
>        replace it with : GB18030
>
> 2. For any of the following Korean char set:
>
>        
> 5601,ksc5601-1987,ksc5601_1987,euckr,ksc5601,ksc_5601,euc_kr,csEUCKR,ks_c_5601-1987
>
>        replace it with :EUC-KR
>
> 3. for any of the following Taiwanese char set:
>
>        ms-874\,ms874\,windows-874\,cp874\,874\,cs874\,ibm874
>
>        replace it with : TIS-620
>
>
> I suggest that in the "DecoderUtil.decodeEncodedWords()" method itself 
> charset fallback should be provided.
>
> For more info, refer http://wiki.whatwg.org/wiki/Web_Encodings also.
>
> Please reply your comments.
>
> Thanks
> Ashish Sharma
>

Re: Mime word not getting decoded using mime4j

Reply via email to