Re: Character Set Question (WAP)

Jonathan Houser Fri, 08 Apr 2005 11:12:02 -0700


      Stipe,

hmm, this charset handling issue is really doing mad... we did consider this already in a very early stage. Unfortunatly I don't recall it anymore in it's details. As I reviewed Paul's patch some time ago, I found myself again in the middle of that material and realized from reviewing the current code that I considered these issues. Now you guys come-up again with it ;)

I really need some "dummy way" to make it clear to me, sorry. Brain is definetly starting to swap.

Essentially the issue as I tracked it in the code was this: The compiling of the WML happens in more or less two stages as far as charsets are concerned. First an upper level function checks the incoming charset versus the valid charsets for the handset (as provided by Accept-Charset headers) and possibly does charset conversion. The lower level function then converts it to UTF-8 for the sake of the XML library, then compiles it. The 'bug' I found (and fixed) was that the charset had been converted in the upper function, but the lower function didn't know this. Thus it went on to prefer the <?xml encoding=..> block over the incoming charset. So I just added a bool to tell the bottom function that the charset had been converted and that the incoming charset should get preference over the other checks. Probably much easier explained with some code:

UPPER FUNCTION:

---

if (charset_convert(content.body,
    octstr_get_cstr(charset), "UTF-8") >= 0) {
        octstr_destroy(content.charset);
        content.charset = octstr_create("UTF-8");
        /* MY CHANGE HERE */
        content.was_converted = 1;
}

LOWER FUNCTION:

---

/* MY CHANGE IS FIRST IF CHECK */
if (was_converted) {
    encoding = octstr_duplicate(charset);
}
else if ((encoding = find_charset_encoding(wml_text)) != NULL) {
    /* ok, we rely on the xml preamble encoding */
} else if (charset && octstr_len(charset) > 0) {
    /* we had a HTTP response charset, use this */
    encoding = octstr_duplicate(charset);
} else {
    /* we had none, so use UTF-8 as default */
    encoding = octstr_create("UTF-8");
}

     Make any more sense?

Jon

Re: Character Set Question (WAP)

Reply via email to