Mime word not getting decoded using mime4j

Sharma, Ashish Tue, 02 Aug 2011 02:14:41 -0700

Hi,

I am trying to decode mime words (the original string is in Chinese characters) 
using DecoderUtil.decodeEncodedWords().


Following is the sample code :

@Test
        public void testEncoding() throws UnsupportedEncodingException, 
IOException{            
                String str = "=?gb2312?B?ztKyu8rH1tCH+LmyrmEudHh0?=";
                str = str + "\r\n ";
                str = str + "=?gb2312?B?ztLKx9bQufrIyy50eHQ=?=";
                str = DecoderUtil.decodeEncodedWords(str);              
                File file = new File("C://chinese2.txt");               
                FileOutputStream fileOut = new FileOutputStream(file);
                fileOut.write(str.getBytes("gb2312"));
                fileOut.flush();
                fileOut.close();                
                        
        }

In above code the characters would seem to be corrupted.

Here the problem is with the character set, most of the mail clients set the 
char sets to be GB2312, but actually to decode the chars correctly I had to use 
GB18030 in the above code. (Refer this for more info: 
http://stackoverflow.com/questions/3856920/character-corruption-for-chinese-simple-and-traditional-and-korean-texts)

Following is the generalization that I had made to replace character sets sent 
by mail clients for correct decoding of characters :

1. For any of following Chinese char set:

        
iso-ir-58,chinese,gbk,cn-gb,csgb2312,csiso58gb231280,euc-cn,euc_cn,euccn,gb2312,gb_2312-80,x-EUC-CN,gb2312-1980,gb2312-80

        replace it with : GB18030

2. For any of the following Korean char set:

        
5601,ksc5601-1987,ksc5601_1987,euckr,ksc5601,ksc_5601,euc_kr,csEUCKR,ks_c_5601-1987

        replace it with :EUC-KR

3. for any of the following Taiwanese char set:

        ms-874\,ms874\,windows-874\,cp874\,874\,cs874\,ibm874

        replace it with : TIS-620
        

I suggest that in the "DecoderUtil.decodeEncodedWords()" method itself charset 
fallback should be provided.

For more info, refer http://wiki.whatwg.org/wiki/Web_Encodings also.

Please reply your comments.

Thanks
Ashish Sharma

Mime word not getting decoded using mime4j

Reply via email to