RE: Mime word not getting decoded using mime4j

Sharma, Ashish Mon, 17 Oct 2011 01:43:55 -0700

Stefano,

I have faced the issue of wrong encoding with following clients:


1. Google webmail client.
2. Yahoo web mail client.
3. Aol web mail client and a lot more.

Moreover I have also found that most of the web browsers have in built 
algorithms to detect the character encodings (especially for South East Asian 
charsets) to circumvent the problems that I am facing.
So I believe such a facility should also be present in mime4j too.

Thanks
Ashish

-----Original Message-----
From: Stefano Bagnara [mailto:[email protected]] 
Sent: Wednesday, August 03, 2011 8:51 PM
To: [email protected]
Subject: Re: Mime word not getting decoded using mime4j

2011/8/3 Sharma, Ashish <[email protected]>:
> Stefano,
>
>>>You say this "wrong charset" is reported by many commonly used mail
>>>clients and so you expect mime4j to have a workaround for that: right?
>
> Yes, you understood right.

First thing we need to identify what clients do the wrong encodings,
can you provide a list?

Stefano

> My JVM details are as follows:
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)
>
> Thanks
> Ashish
>
> -----Original Message-----
> From: Stefano Bagnara [mailto:[email protected]]
> Sent: Tuesday, August 02, 2011 2:54 PM
> To: [email protected]
> Subject: Re: Mime word not getting decoded using mime4j
>
> Hi,
>
> I'm not sure I understood the issue.
>
> If I understand it correctly mime4j is working good but your input
> string declare a wrong charset. Is this right?
>
> You say this "wrong charset" is reported by many commonly used mail
> clients and so you expect mime4j to have a workaround for that: right?
>
> What JVM are you using?
>
> Stefano
>
> 2011/8/2 Sharma, Ashish <[email protected]>:
>> Hi,
>>
>> I am trying to decode mime words (the original string is in Chinese 
>> characters) using DecoderUtil.decodeEncodedWords().
>>
>> Following is the sample code :
>>
>> @Test
>>        public void testEncoding() throws UnsupportedEncodingException, 
>> IOException{
>>                String str = "=?gb2312?B?ztKyu8rH1tCH+LmyrmEudHh0?=";
>>                str = str + "\r\n ";
>>                str = str + "=?gb2312?B?ztLKx9bQufrIyy50eHQ=?=";
>>                str = DecoderUtil.decodeEncodedWords(str);
>>                File file = new File("C://chinese2.txt");
>>                FileOutputStream fileOut = new FileOutputStream(file);
>>                fileOut.write(str.getBytes("gb2312"));
>>                fileOut.flush();
>>                fileOut.close();
>>
>>        }
>>
>> In above code the characters would seem to be corrupted.
>>
>> Here the problem is with the character set, most of the mail clients set the 
>> char sets to be GB2312, but actually to decode the chars correctly I had to 
>> use GB18030 in the above code. (Refer this for more info: 
>> http://stackoverflow.com/questions/3856920/character-corruption-for-chinese-simple-and-traditional-and-korean-texts)
>>
>> Following is the generalization that I had made to replace character sets 
>> sent by mail clients for correct decoding of characters :
>>
>> 1. For any of following Chinese char set:
>>
>>        
>> iso-ir-58,chinese,gbk,cn-gb,csgb2312,csiso58gb231280,euc-cn,euc_cn,euccn,gb2312,gb_2312-80,x-EUC-CN,gb2312-1980,gb2312-80
>>
>>        replace it with : GB18030
>>
>> 2. For any of the following Korean char set:
>>
>>        
>> 5601,ksc5601-1987,ksc5601_1987,euckr,ksc5601,ksc_5601,euc_kr,csEUCKR,ks_c_5601-1987
>>
>>        replace it with :EUC-KR
>>
>> 3. for any of the following Taiwanese char set:
>>
>>        ms-874\,ms874\,windows-874\,cp874\,874\,cs874\,ibm874
>>
>>        replace it with : TIS-620
>>
>>
>> I suggest that in the "DecoderUtil.decodeEncodedWords()" method itself 
>> charset fallback should be provided.
>>
>> For more info, refer http://wiki.whatwg.org/wiki/Web_Encodings also.
>>
>> Please reply your comments.
>>
>> Thanks
>> Ashish Sharma
>>
>

RE: Mime word not getting decoded using mime4j

Reply via email to