Replying to Rune Saetre:
> I think this patch should be applied before 1.4.1 is released.
>
> I have submitted this before, but without the [PATCH] tag in the subject
> line. And here it is again, since I cannot see it has been applied to the
> CVS.
>
> This patches the gateway-1.4.0/gwlib/http.c file from the 1.4.0 version to
> set ISO-8859-1 as charset for subtypes of "text" if no charset is
> specified in the headers.
What happens if you don't ?
E.g. we receive some text/plain without explicit charset set, then ...
> This is in accordance with RFC 2616, section 3.7.1.
> It also adresses bug #0000068.
> Moreover, wapbox is totally useless without it here in Norway...
So ?
I know the reasoning I had when invented NEW_CHARSET. Phone claims it
supports multiple encodings - like, for example, latin1, koi8-r, and
utf8, but does this in-line and with q=0.x weights. The, webserver
decides to present russian content in, for example, koi8. But there's
<?xml encoding=utf-8?> ot whatever in wml source anyway, and wml
compiler bombs at the stage it calls libxml.
So I am explicitly requesting UTF-8 in that case, then recoding any
received page to UTF-8 before feeding it to libxml.
But text/plain ...
--
Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key
This message represents the official view of the voices in my head
Index: gateway.A9/configure.in
===================================================================
--- gateway.A9.orig/configure.in 2004-11-06 17:36:55.728060912 +0300
+++ gateway.A9/configure.in 2004-11-06 17:46:21.818002160 +0300
@@ -216,6 +216,18 @@
]
)
+AC_MSG_CHECKING([whether to do all wapbox xml processing in utf-8])
+AC_ARG_ENABLE(scharset,
+[ --enable-scharset do all wapbox xml processing in utf-8],
+[
+ if test "$enableval" != yes; then
+ AC_MSG_RESULT(no)
+ else
+ AC_MSG_RESULT(yes)
+ AC_DEFINE(NEW_CHARSETS, 1, [Simplify wapbox charset processing])
+ fi
+])
+
dnl Extra feature checks
dnl GW_HAVE_TYPE_FROM(HDRNAME, TYPE, HAVENAME, DESCRIPTION)
Index: gateway.A9/gw/wap-appl.c
===================================================================
--- gateway.A9.orig/gw/wap-appl.c 2004-11-06 17:41:37.645202992 +0300
+++ gateway.A9/gw/wap-appl.c 2004-11-06 17:46:21.819002008 +0300
@@ -718,6 +718,10 @@
* to handle those charsets for all content types, just WML/XHTML. */
static void add_charset_headers(List *headers)
{
+#ifdef NEW_CHARSETS
+ if (!http_charset_accepted(headers, "utf-8"))
+ http_header_add(headers, "Accept-Charset", "utf-8");
+#else
long i, len;
gw_assert(charsets != NULL);
@@ -727,6 +731,7 @@
if (!http_charset_accepted(headers, charset))
http_header_add(headers, "Accept-Charset", charset);
}
+#endif
}
@@ -1055,11 +1060,29 @@
/* get charset used in content body, default to utf-8 if not
present */
if ((charset = find_charset_encoding(content.body)) == NULL)
+#ifdef NEW_CHARSETS
+ if (octstr_len(content.charset) > 0) {
+ charset = octstr_duplicate(content.charset);
+ } else {
+ charset = octstr_imm("UTF-8");
+ }
+#else
charset = octstr_imm("UTF-8");
+#endif
/* convert to utf-8 if original charset is not utf-8
* and device supports it */
+#ifdef NEW_CHARSETS
+ if (octstr_case_compare(charset, octstr_imm("UTF-8")) != 0) {
+ debug("wsp",0,"Converting wml/xhtml from charset <%s> to
UTF-8",
+ octstr_get_cstr(charset));
+ if (charset_convert(content.body, octstr_get_cstr(charset),
"UTF-8") >= 0) {
+ octstr_destroy(content.charset);
+ content.charset = octstr_create("UTF-8");
+ }
+ }
+#else
if (octstr_case_compare(charset, octstr_imm("UTF-8")) < 0 &&
!http_charset_accepted(device_headers,
octstr_get_cstr(charset))) {
if (!http_charset_accepted(device_headers, "UTF-8")) {
@@ -1097,6 +1120,7 @@
}
}
}
+#endif
octstr_destroy(charset);
}