Stefano, I have faced the issue of wrong encoding with following clients:
1. Google webmail client. 2. Yahoo web mail client. 3. Aol web mail client and a lot more. Moreover I have also found that most of the web browsers have in built algorithms to detect the character encodings (especially for South East Asian charsets) to circumvent the problems that I am facing. So I believe such a facility should also be present in mime4j too. Thanks Ashish -----Original Message----- From: Stefano Bagnara [mailto:[email protected]] Sent: Wednesday, August 03, 2011 8:51 PM To: [email protected] Subject: Re: Mime word not getting decoded using mime4j 2011/8/3 Sharma, Ashish <[email protected]>: > Stefano, > >>>You say this "wrong charset" is reported by many commonly used mail >>>clients and so you expect mime4j to have a workaround for that: right? > > Yes, you understood right. First thing we need to identify what clients do the wrong encodings, can you provide a list? Stefano > My JVM details are as follows: > > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing) > > Thanks > Ashish > > -----Original Message----- > From: Stefano Bagnara [mailto:[email protected]] > Sent: Tuesday, August 02, 2011 2:54 PM > To: [email protected] > Subject: Re: Mime word not getting decoded using mime4j > > Hi, > > I'm not sure I understood the issue. > > If I understand it correctly mime4j is working good but your input > string declare a wrong charset. Is this right? > > You say this "wrong charset" is reported by many commonly used mail > clients and so you expect mime4j to have a workaround for that: right? > > What JVM are you using? > > Stefano > > 2011/8/2 Sharma, Ashish <[email protected]>: >> Hi, >> >> I am trying to decode mime words (the original string is in Chinese >> characters) using DecoderUtil.decodeEncodedWords(). >> >> Following is the sample code : >> >> @Test >> public void testEncoding() throws UnsupportedEncodingException, >> IOException{ >> String str = "=?gb2312?B?ztKyu8rH1tCH+LmyrmEudHh0?="; >> str = str + "\r\n "; >> str = str + "=?gb2312?B?ztLKx9bQufrIyy50eHQ=?="; >> str = DecoderUtil.decodeEncodedWords(str); >> File file = new File("C://chinese2.txt"); >> FileOutputStream fileOut = new FileOutputStream(file); >> fileOut.write(str.getBytes("gb2312")); >> fileOut.flush(); >> fileOut.close(); >> >> } >> >> In above code the characters would seem to be corrupted. >> >> Here the problem is with the character set, most of the mail clients set the >> char sets to be GB2312, but actually to decode the chars correctly I had to >> use GB18030 in the above code. (Refer this for more info: >> http://stackoverflow.com/questions/3856920/character-corruption-for-chinese-simple-and-traditional-and-korean-texts) >> >> Following is the generalization that I had made to replace character sets >> sent by mail clients for correct decoding of characters : >> >> 1. For any of following Chinese char set: >> >> >> iso-ir-58,chinese,gbk,cn-gb,csgb2312,csiso58gb231280,euc-cn,euc_cn,euccn,gb2312,gb_2312-80,x-EUC-CN,gb2312-1980,gb2312-80 >> >> replace it with : GB18030 >> >> 2. For any of the following Korean char set: >> >> >> 5601,ksc5601-1987,ksc5601_1987,euckr,ksc5601,ksc_5601,euc_kr,csEUCKR,ks_c_5601-1987 >> >> replace it with :EUC-KR >> >> 3. for any of the following Taiwanese char set: >> >> ms-874\,ms874\,windows-874\,cp874\,874\,cs874\,ibm874 >> >> replace it with : TIS-620 >> >> >> I suggest that in the "DecoderUtil.decodeEncodedWords()" method itself >> charset fallback should be provided. >> >> For more info, refer http://wiki.whatwg.org/wiki/Web_Encodings also. >> >> Please reply your comments. >> >> Thanks >> Ashish Sharma >> >
