Hi,

After having banged my heads for weeks, i found the problems:

The issue was not device dependant, but network access dependant.

For some reason the pages encoding when accessed using my mobile operator
access are changed to UTF-8, as showed in the ContentType HTTP header
(ContentType: text/html; Charset=UTF-8) whereas the HTTP content still
specifies ISO-8859-1 in meta
tag.


So the final solution is to :
1) grab encoding in the HTTP ContentType header if any
2) if so set the feature
http://cyberneko.org/html/features/scanner/ignore-specified-charset to false
3) in the XMLInputSource constructor, pass "ISO-8859-1" by default or the
charset found in ContentType header if any
4) in the filter characters function, no decoding/encoding/getByte or
whatsoever charset change is further required; XMLString.toString() will
directly gives correct :)

Hope this will oneday help another charset-newbie ;)

Thierry.


2010/2/13 Thierry Legras <[email protected]>

> Thanks for your reply.
>
> yes this is a java.lang.String. Indeed all i want to do is to correctly
> display the string in some View.
>
> Ok i got the point about java String being 16 bits. If so, and as it is not
> well displayed, i guess this means it was not properly created at first.
>
> Maybe this issue is more related to my (bad) use of xerces when i
> initialize the xerces XMLDocumentFilter object.
>
>             XMLParserConfiguration parser = new HTMLConfiguration();
>             parser.setDocumentHandler(filter); // filter is a
> XMLDocumentFilter
>             XMLInputSource source = new XMLInputSource(null, null,
> null,myHttpResponse.getEntity().getContent(), "iso-8859-1");
>             parser.parse(source);
>
> I will check more in detail in xerces ressources, this probably is not an
> Android related topic after all.
>
> Thierry.
>
>
> 2010/2/13 Frank Weiss <[email protected]>
>
> First, some clarifications. Locale has nothing to do with character
>> encoding. Java stores all character data internally as 16-bit Unicode,
>> regardless of locale.
>>
>> I suspect that myString.getBytes("iso-8859-1") is erroneous. I'm assuming
>> that myString is of type java.lang.String. What are you doing with the
>> result and why do you want to encode a sequence of Unicode characters back
>> to ISO-8859-1 (Latin1)?
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Android Developers" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]<android-developers%[email protected]>
>> For more options, visit this group at
>> http://groups.google.com/group/android-developers?hl=en
>
>
>


-- 
Thierry.

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to