I finally found some time (and luck!) to hack this piece.
Please incorporate the patch:
Before this patch multisync did not handle UTF-7 charset (used at
least in SonyEricsson T610 to handle non-ASCII strings) at all.
After this patch, IrMC device option "Translate from character set:
UTF-7" (in IrMC/Options/Bug workarounds) is finally capable of
handling UTF-7 in the following fashion:
1) check whether string does not break UTF-7 rules (ASCII only
characters and plus certain restrictions).
2) translate only the following tags: N (name), ORG (company), TITLE
(title), SUMMARY (subject), LOCATION (location), DESCRIPTION
(details). This is necessary to prevent converting phone numbers (and
other technical data?) which might look like UTF-7, e.g. +123-456-789
3) if any of the above conditions fail, then fall-back to ISO8859-1
(western charset)
UTF-7 status with T610 after this patch:
Name, subject, location and description conversion passed all my tests
with contacts, calendar events and tasks.
Company and title become UTF-8 garbage after some back-and-forth
synchronizations (probably another bug, perhaps in T610, or UTF-7 is
not used when sending changes to IrMC).
Other charsets are not effected byt this patch.
Even Western characters like Æ and Ø are handled correctly when using
this patch where there are two cases: only western characters are used
and T610 uses ISO8859-1 to represent them then condition 1 fails and
code falls-back to correct ISO8859-1; there are mixed characters (e.g.
Greek and/or Lithuanian added) then T610 uses UTF-7 which is correctly
detected and converted by this patch.
Conclusion:
In general UTF-7 might be even better choice than default ISO8859-1,
especially considering T610 device.
diff -u'rNF^function' multisync-0.82/src/sync_vtype.c multisync-0.82-patched/src/sync_vtype.c
--- multisync-0.82/src/sync_vtype.c 2004-04-12 23:03:29.000000000 +0200
+++ multisync-0.82-patched/src/sync_vtype.c 2006-09-14 10:45:02.000000000 +0200
@@ -364,17 +364,56 @@
}
if ((opts & VOPTION_FIXCHARSET) && value && charset) {
iconv_t ic;
- int t;
+ int t, valuelen = strlen(value);
+ const char *realCharSet = charset;
gboolean highchar = FALSE;
- for (t = 0; t < strlen(value); t++)
+ gboolean fixcharset = FALSE;
+ for (t = 0; t < valuelen; ++t)
if (value[t] > 127)
highchar = TRUE;
- if (highchar) {
- ic = iconv_open("UTF-8", charset);
+ fixcharset = highchar;
+ if (g_strcasecmp(charset, "UTF-7")==0 ||
+ g_strcasecmp(charset, "UTF7")==0) {
+ // fix the charset only if string is really valid UTF-7,
+ // otherwise iconv may get stuck and lose some data).
+ // UTF-7 is valid only if all chars are ASCII:
+ if (highchar) fixcharset = FALSE;
+ else {
+ // convert only selected items (e.g. taken from SonyEricsson T610)
+ // the other items are likely to be not UTF-7 encoded
+ // (e.g. TEL number "+123-331" is not UTF-7, although looks like).
+ if (g_strcasecmp(name, "N")==0 ||
+ g_strcasecmp(name, "ORG")==0 ||
+ g_strcasecmp(name, "TITLE")==0 ||
+ g_strcasecmp(name, "SUMMARY")==0 ||
+ g_strcasecmp(name, "LOCATION")==0 ||
+ g_strcasecmp(name, "DESCRIPTION")==0)
+ fixcharset = TRUE;
+ // every '+' should have a corresponding '-' and in between should
+ // be Base64 characters if any ("+-" means '+' in UTF-7):
+ const char *plus, *minus = value;
+ while (fixcharset && (plus = strchr(minus, '+')) != NULL) {
+ minus = strchr(plus, '-');
+ if (minus == NULL) { fixcharset = FALSE; break; }
+ const char *p;
+ for (p=plus+1; p!=minus; ++p)
+ if (!isalnum(*p) && (*p != '+') && (*p != '/')) {
+ fixcharset = FALSE; break;
+ }
+ }
+ }
+ if (!fixcharset) {
+ // at the end if UTF-7 fails, Western charset is the best bet:
+ realCharSet = "ISO8859-1";
+ fixcharset = TRUE;
+ }
+ }
+ if (fixcharset) {
+ ic = iconv_open("UTF-8", realCharSet);
if (ic >= 0) {
- char *utfvalue = g_malloc0(65536);
size_t inbytes = strlen(value);
- size_t outbytes = 65536;
+ size_t outbytes = inbytes*8;
+ char *utfvalue = g_malloc0(outbytes);
char *inbuf = value, *outbuf = utfvalue;
iconv(ic, &inbuf, &inbytes, &outbuf, &outbytes);
g_free(value);