Hey, congratulations! I'm glad you figured it out. And it is not the
simplest problem to master.

On Tue, Apr 6, 2010 at 6:40 PM, Thierry Legras <[email protected]> wrote:

> Hi,
>
> After having banged my heads for weeks, i found the problems:
>
> The issue was not device dependant, but network access dependant.
>
> For some reason the pages encoding when accessed using my mobile operator
> access are changed to UTF-8, as showed in the ContentType HTTP header
> (ContentType: text/html; Charset=UTF-8) whereas the HTTP content still
> specifies ISO-8859-1 in meta
> tag.
>
>
> So the final solution is to :
> 1) grab encoding in the HTTP ContentType header if any
> 2) if so set the feature
> http://cyberneko.org/html/features/scanner/ignore-specified-charset to
> false
> 3) in the XMLInputSource constructor, pass "ISO-8859-1" by default or the
> charset found in ContentType header if any
> 4) in the filter characters function, no decoding/encoding/getByte or
> whatsoever charset change is further required; XMLString.toString() will
> directly gives correct :)
>
> Hope this will oneday help another charset-newbie ;)
>
> Thierry.
>
>
> 2010/2/13 Thierry Legras <[email protected]>
>
> Thanks for your reply.
>>
>> yes this is a java.lang.String. Indeed all i want to do is to correctly
>> display the string in some View.
>>
>> Ok i got the point about java String being 16 bits. If so, and as it is
>> not well displayed, i guess this means it was not properly created at first.
>>
>>
>> Maybe this issue is more related to my (bad) use of xerces when i
>> initialize the xerces XMLDocumentFilter object.
>>
>>             XMLParserConfiguration parser = new HTMLConfiguration();
>>             parser.setDocumentHandler(filter); // filter is a
>> XMLDocumentFilter
>>             XMLInputSource source = new XMLInputSource(null, null,
>> null,myHttpResponse.getEntity().getContent(), "iso-8859-1");
>>             parser.parse(source);
>>
>> I will check more in detail in xerces ressources, this probably is not an
>> Android related topic after all.
>>
>> Thierry.
>>
>>
>> 2010/2/13 Frank Weiss <[email protected]>
>>
>>  First, some clarifications. Locale has nothing to do with character
>>> encoding. Java stores all character data internally as 16-bit Unicode,
>>> regardless of locale.
>>>
>>> I suspect that myString.getBytes("iso-8859-1") is erroneous. I'm assuming
>>> that myString is of type java.lang.String. What are you doing with the
>>> result and why do you want to encode a sequence of Unicode characters back
>>> to ISO-8859-1 (Latin1)?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Android Developers" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]<android-developers%[email protected]>
>>> For more options, visit this group at
>>> http://groups.google.com/group/android-developers?hl=en
>>
>>
>>
>
>
> --
> Thierry.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Android Developers" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]<android-developers%[email protected]>
> For more options, visit this group at
> http://groups.google.com/group/android-developers?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

To unsubscribe, reply using "remove me" as the subject.

Reply via email to