Most messages with subjects and From: headers using characters outside the 
ASCII set now use the RFC-2047 encoding to keep the actual bytes in the message 
"7-bit safe". But there are still a significant number of messages coming in 
which use national encoding: big5 from China, Taiwan, and Singapore; EUC-JIS 
and shift-JIS from Japan; cp1255 from Israel; etc.
What is the best way to convert these strings into UTF-8?
Since these contain 8-bit characters, I tried using g_mime_utils_decode_8bit 
with a NULL encoding, assuming it would determine the best one to use. But in 
my test, this didn't work at all. (My test consisted of:
- starting with one UTF-8 string for each of 4 encodings, the equivalent of   - 
"Happy New Year" in Chinese (big5  - "Good Morning" for shift-JIS  - "Good 
Evening" for EUC-JIS  - "Peace unto you" for cp1255- I converted the UTF-8 to a 
byte sequence using the corresponding encoding.- I then fed the four resulting 
byte sequences to g_mime_utils_decode_8bit and wrote out the results
I confirmed that the input to g_mime_utils_decode_8bit were correctly encoded 
by decoding them with the proper decoding.
1. is g_mime_utils_decode_8bit the right tool for the job? I assume it works 
properly when one actually knows the encoding, but when one doesn't?
2. if so, how should I be using it, because:        output_ptr = 
g_mime_utils_decode_8bit(NULL, input_ptr, input_length);   isn't doing it.
3. if it isn't, what is the right way?
gmime-devel-list mailing list

Reply via email to