Bug#274022: patch

Marius Mikučionis Thu, 14 Sep 2006 02:33:26 -0700

I finally found some time (and luck!) to hack this piece.


Please incorporate the patch:
Before this patch multisync did not handle UTF-7 charset (used at
least in SonyEricsson T610 to handle non-ASCII strings) at all.
After this patch, IrMC device option "Translate from character set:
UTF-7" (in IrMC/Options/Bug workarounds) is finally capable of
handling UTF-7 in the following fashion:
1) check whether string does not break UTF-7 rules (ASCII only
characters and plus certain restrictions).
2) translate only the following tags: N (name), ORG (company), TITLE
(title), SUMMARY (subject), LOCATION (location), DESCRIPTION
(details). This is necessary to prevent converting phone numbers (and
other technical data?) which might look like UTF-7, e.g. +123-456-789
3) if any of the above conditions fail, then fall-back to ISO8859-1
(western charset)

UTF-7 status with T610 after this patch:
Name, subject, location and description conversion passed all my tests
with contacts, calendar events and tasks.
Company and title become UTF-8 garbage after some back-and-forth
synchronizations (probably another bug, perhaps in T610, or UTF-7 is
not used when sending changes to IrMC).

Other charsets are not effected byt this patch.
Even Western characters like Æ and Ø are handled correctly when using
this patch where there are two cases: only western characters are used
and T610 uses ISO8859-1 to represent them then condition 1 fails and
code falls-back to correct ISO8859-1; there are mixed characters (e.g.
Greek and/or Lithuanian added) then T610 uses UTF-7 which is correctly
detected and converted by this patch.

Conclusion:
In general UTF-7 might be even better choice than default ISO8859-1,
especially considering T610 device.

diff -u'rNF^function' multisync-0.82/src/sync_vtype.c multisync-0.82-patched/src/sync_vtype.c
--- multisync-0.82/src/sync_vtype.c	2004-04-12 23:03:29.000000000 +0200
+++ multisync-0.82-patched/src/sync_vtype.c	2006-09-14 10:45:02.000000000 +0200
@@ -364,17 +364,56 @@
       }
       if ((opts & VOPTION_FIXCHARSET) && value && charset) {
 	iconv_t ic;
-	int t;
+	int t, valuelen = strlen(value);
+	const char *realCharSet = charset;
 	gboolean highchar = FALSE;
-	for (t = 0; t < strlen(value); t++)
+	gboolean fixcharset = FALSE;
+	for (t = 0; t < valuelen; ++t)
 	  if (value[t] > 127)
 	    highchar = TRUE;
-	if (highchar) {
-	  ic = iconv_open("UTF-8", charset);
+	fixcharset = highchar;
+	if (g_strcasecmp(charset, "UTF-7")==0 ||
+	    g_strcasecmp(charset, "UTF7")==0) {
+	  // fix the charset only if string is really valid UTF-7,
+	  // otherwise iconv may get stuck and lose some data).
+	  // UTF-7 is valid only if all chars are ASCII:
+	  if (highchar) fixcharset = FALSE;
+	  else {
+	    // convert only selected items (e.g. taken from SonyEricsson T610)
+	    // the other items are likely to be not UTF-7 encoded
+	    // (e.g. TEL number "+123-331" is not UTF-7, although looks like).
+	    if (g_strcasecmp(name, "N")==0 ||
+		g_strcasecmp(name, "ORG")==0 ||
+		g_strcasecmp(name, "TITLE")==0 ||
+		g_strcasecmp(name, "SUMMARY")==0 ||
+		g_strcasecmp(name, "LOCATION")==0 ||
+		g_strcasecmp(name, "DESCRIPTION")==0)
+	      fixcharset = TRUE;
+	    // every '+' should have a corresponding '-' and in between should
+	    // be Base64 characters if any ("+-" means '+' in UTF-7):
+	    const char *plus, *minus = value;
+	    while (fixcharset && (plus = strchr(minus, '+')) != NULL) {
+	      minus = strchr(plus, '-');
+	      if (minus == NULL) { fixcharset = FALSE; break; }
+	      const char *p;
+	      for (p=plus+1; p!=minus; ++p)
+		if (!isalnum(*p) && (*p != '+') && (*p != '/')) {
+		  fixcharset = FALSE; break;
+		}
+	    }
+	  }
+	  if (!fixcharset) {
+	    // at the end if UTF-7 fails, Western charset is the best bet:
+	    realCharSet = "ISO8859-1";
+	    fixcharset = TRUE;
+	  }
+	}
+	if (fixcharset) {
+	  ic = iconv_open("UTF-8", realCharSet);
 	  if (ic >= 0) {
-	    char *utfvalue = g_malloc0(65536);
 	    size_t inbytes = strlen(value);
-	    size_t outbytes = 65536;
+	    size_t outbytes = inbytes*8;
+	    char *utfvalue = g_malloc0(outbytes);
 	    char *inbuf = value, *outbuf = utfvalue;
 	    iconv(ic, &inbuf, &inbytes, &outbuf, &outbytes);
 	    g_free(value);

Bug#274022: patch

Reply via email to