Replying to Yury.Mikhienko: > fix that after rollback the > http://www.kannel.org/cgi-bin/viewcvs.cgi/gateway/gwlib/http.c.diff?r1=1.225&r2=1.226&sortby=date > > patch > now log is: > 2005-03-09 17:55:04 [22126] [7] DEBUG: WML compiler: Charset is <> > 2005-03-09 17:55:04 [22126] [7] DEBUG: WML compiler: Encoding is <UTF-8> > -- may be reason in that?
Yes. Here, in russia, there are lot of content-provider servers without explicit charset specified in Content-type: header, but body supplied in UTF-8. Or, charset is specified in preamble. In patched version I use I just commented out that fragment, and assuming charset = UTF-8 in such cases. Actually there are many many corner cases, and our content-provs manage to hit them all. For example, they can have apache-rus with encoding on, which will do funny things with content - but <?xml encoding= ?> preamble is obviously untouched. In this case we should trust HTTP headers. But some other provider thinks that he can add Content-type: ...; charset=ISO8859-5 or whatever and then load a bunch of wmls with different encoding= in xml preamble on ... So for now I'm stick with that logic: - Check if xml document contains encoding= in preamble. If it is, then assume charset == preamble value - If previous check was negative, try to get charset from HTTP headers - If previous was negative, then charset = UTF-8 - If charset != UTF8 and charset is not accepted by device, and UTF-8 accepted by device, then recode body from charset to UTF-8, strip <?xml ... ?> preamble, and set charset = UTF-8 - (same for ISO8859-1) - do rest of content processing. This avoids most encoding glitches and double-encoding bugs. Hope this helps. -- Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key This message represents the official view of the voices in my head
