Hi Yuval,

It is the correct method to use, however, you need to specify a list of 
charsets that it should even attempt to try.

What you need to do is:

static const char **charsets = { “big5”, “shift-jis”, “euc-jis”, “cp1255”, NULL 
};
options = g_mime_parser_options_clone (NULL);
g_mime_parser_options_set_fallback_charsets (options, charsets);

Then pass those options into decode_8bit().

Hope that helps,

Jeff

From: gmime-devel-list <gmime-devel-list-boun...@gnome.org> on behalf of Yuval 
Peduel via gmime-devel-list <gmime-devel-list@gnome.org>
Reply-To: Yuval Peduel <yped...@yahoo-inc.com>
Date: Wednesday, August 9, 2017 at 2:00 PM
To: "gmime-devel-list@gnome.org" <gmime-devel-list@gnome.org>
Subject: [gmime-devel] determining encodings

Most messages with subjects and From: headers using characters outside the 
ASCII set now use the RFC-2047 encoding to keep the actual bytes in the message 
"7-bit safe". But there are still a significant number of messages coming in 
which use national encoding: big5 from China, Taiwan, and Singapore; EUC-JIS 
and shift-JIS from Japan; cp1255 from Israel; etc.

What is the best way to convert these strings into UTF-8?

Since these contain 8-bit characters, I tried using g_mime_utils_decode_8bit 
with a NULL encoding, assuming it would determine the best one to use. But in 
my test, this didn't work at all. (My test consisted of:

- starting with one UTF-8 string for each of 4 encodings, the equivalent of
  - "Happy New Year" in Chinese (big5
  - "Good Morning" for shift-JIS
  - "Good Evening" for EUC-JIS
  - "Peace unto you" for cp1255
- I converted the UTF-8 to a byte sequence using the corresponding encoding.
- I then fed the four resulting byte sequences to g_mime_utils_decode_8bit and 
wrote out the results

I confirmed that the input to g_mime_utils_decode_8bit were correctly encoded 
by decoding them with the proper decoding.

So:

1. is g_mime_utils_decode_8bit the right tool for the job? I assume it works 
properly when one actually knows the encoding, but when one doesn't?

2. if so, how should I be using it, because:
        output_ptr = g_mime_utils_decode_8bit(NULL, input_ptr, input_length);
   isn't doing it.

3. if it isn't, what is the right way?

TIA.
_______________________________________________
gmime-devel-list mailing list
gmime-devel-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gmime-devel-list

Reply via email to