Hey, congratulations! I'm glad you figured it out. And it is not the simplest problem to master.
On Tue, Apr 6, 2010 at 6:40 PM, Thierry Legras <[email protected]> wrote: > Hi, > > After having banged my heads for weeks, i found the problems: > > The issue was not device dependant, but network access dependant. > > For some reason the pages encoding when accessed using my mobile operator > access are changed to UTF-8, as showed in the ContentType HTTP header > (ContentType: text/html; Charset=UTF-8) whereas the HTTP content still > specifies ISO-8859-1 in meta > tag. > > > So the final solution is to : > 1) grab encoding in the HTTP ContentType header if any > 2) if so set the feature > http://cyberneko.org/html/features/scanner/ignore-specified-charset to > false > 3) in the XMLInputSource constructor, pass "ISO-8859-1" by default or the > charset found in ContentType header if any > 4) in the filter characters function, no decoding/encoding/getByte or > whatsoever charset change is further required; XMLString.toString() will > directly gives correct :) > > Hope this will oneday help another charset-newbie ;) > > Thierry. > > > 2010/2/13 Thierry Legras <[email protected]> > > Thanks for your reply. >> >> yes this is a java.lang.String. Indeed all i want to do is to correctly >> display the string in some View. >> >> Ok i got the point about java String being 16 bits. If so, and as it is >> not well displayed, i guess this means it was not properly created at first. >> >> >> Maybe this issue is more related to my (bad) use of xerces when i >> initialize the xerces XMLDocumentFilter object. >> >> XMLParserConfiguration parser = new HTMLConfiguration(); >> parser.setDocumentHandler(filter); // filter is a >> XMLDocumentFilter >> XMLInputSource source = new XMLInputSource(null, null, >> null,myHttpResponse.getEntity().getContent(), "iso-8859-1"); >> parser.parse(source); >> >> I will check more in detail in xerces ressources, this probably is not an >> Android related topic after all. >> >> Thierry. >> >> >> 2010/2/13 Frank Weiss <[email protected]> >> >> First, some clarifications. Locale has nothing to do with character >>> encoding. Java stores all character data internally as 16-bit Unicode, >>> regardless of locale. >>> >>> I suspect that myString.getBytes("iso-8859-1") is erroneous. I'm assuming >>> that myString is of type java.lang.String. What are you doing with the >>> result and why do you want to encode a sequence of Unicode characters back >>> to ISO-8859-1 (Latin1)? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Android Developers" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected]<android-developers%[email protected]> >>> For more options, visit this group at >>> http://groups.google.com/group/android-developers?hl=en >> >> >> > > > -- > Thierry. > > -- > You received this message because you are subscribed to the Google > Groups "Android Developers" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected]<android-developers%[email protected]> > For more options, visit this group at > http://groups.google.com/group/android-developers?hl=en > -- You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en To unsubscribe, reply using "remove me" as the subject.

