I was jumping around like mad because some sites aren't working.
Some digging shows that it is because we got wrong charset body
without correct charset header.
More deeper digging involved Russian Apache default config (which says
that mobile User-Agents are braindead and all and any ;charset= header
portion should be killed regardless of result body encoding).

And finally I've found add_charset_headers in wap-appl.c and
corresponding part in return_reply.

Who and for what purpose coded it _that_ way?
On the first sight it produces complete mess, which cannot be
correctly xml'ed.

For now I did (attached patch) and happily watching wapbox logs
with "failed xml compile" messages count drastically reduced.

You can try this at home too.

Enjoy.
-- 
Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key
 This message represents the official view of the voices in my head
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/22 20:47:20+03:00 [EMAIL PROTECTED] 
#   Rework charset stuff, charset recoding and accept-charset headers part 0.
# 
# gw/wap-appl.c
#   2004/02/22 20:47:18+03:00 [EMAIL PROTECTED] +25 -0
#   replace crappy charset header and recoding with simplier
# 
# BitKeeper/etc/ignore
#   2004/02/22 20:47:18+03:00 [EMAIL PROTECTED] +6 -0
#   add couple ignores
# 
diff -Nru a/gw/wap-appl.c b/gw/wap-appl.c
--- a/gw/wap-appl.c     Mon Feb 23 00:19:56 2004
+++ b/gw/wap-appl.c     Mon Feb 23 00:19:56 2004
@@ -103,6 +103,7 @@
 #endif
 
 #define ENABLE_NOT_ACCEPTED 
+#define NEW_CHARSETS
 
 /*
  * Give the status the module:
@@ -668,6 +669,10 @@
  * to handle those charsets for all content types, just WML/XHTML. */
 static void add_charset_headers(List *headers) 
 {
+#ifdef NEW_CHARSETS
+    if (!http_charset_accepted(headers, "utf-8"))
+        http_header_add(headers, "Accept-Charset", "utf-8");
+#else
     long i, len;
     
     gw_assert(charsets != NULL);
@@ -677,6 +682,7 @@
         if (!http_charset_accepted(headers, charset))
             http_header_add(headers, "Accept-Charset", charset);
     }
+#endif
 }
 
 
@@ -1005,11 +1011,29 @@
             
             /* get charset used in content body, default to utf-8 if not present */
             if ((charset = find_charset_encoding(content.body)) == NULL)
+#ifdef NEW_CHARSETS
+                if (octstr_len(content.charset) > 0) {
+                    charset = octstr_duplicate(content.charset);
+                } else {
+                    charset = octstr_imm("UTF-8");
+                }
+#else
                 charset = octstr_imm("UTF-8"); 
+#endif
 
             /* convert to utf-8 if original charset is not utf-8 
              * and device supports it */
 
+#ifdef NEW_CHARSETS
+            if (octstr_case_compare(charset, octstr_imm("UTF-8")) != 0) {
+                debug("wsp",0,"Converting wml/xhtml from charset <%s> to UTF-8",
+                    octstr_get_cstr(charset));
+                if (charset_convert(content.body, octstr_get_cstr(charset), "UTF-8") 
>= 0) {
+                    octstr_destroy(content.charset);
+                    content.charset = octstr_create("UTF-8");
+                }
+            }
+#else
             if (octstr_case_compare(charset, octstr_imm("UTF-8")) < 0 &&
                 !http_charset_accepted(device_headers, octstr_get_cstr(charset))) {
                 if (!http_charset_accepted(device_headers, "UTF-8")) {
@@ -1047,6 +1071,7 @@
                     }
                 }
             }
+#endif
 
             octstr_destroy(charset);
         }

Reply via email to