Magnus Olstad Hansen wrote:
Hello Oleg,
Sorry - I only tried what seemed easiest to me first here. Today is the
first day I've had the time to look into this again.
A wire + context log should be attached to this mail. Hope this can
clearify what is going on.
PS! I'm pretty sure that EntityUtils.getContentCharset() returns the
right charset (UTF-8 in this case) - so the confusing point is why
EntityUtils.toString() returns 0x3F for the norwegian letters. As far as
I could tell from the sources the charset from getContentCharset() has
top priority in toString()...
Thanks for helping,
Magnus
That's what I am seeing in the wire log
---
FINE: << "[0x9][0x9]<li><a
href="http://go.vg.no/cgi-bin/go.cgi/meny/http://elisanett.vgb.no/">Skjermdump
(blogg)</a></li>[\n]"
FINE: << "[0x9][0x9]<li><a
href="http://go.vg.no/cgi-bin/go.cgi/meny/http://elisabeth.vgb.no/">Frue
p[0xc3][0xa5] veggen (blogg)</a></li>[\n]"
---
As far as I can tell the content is correctly encoded as UTF-8
This is a snippet of output produced using EntityUtils#toString
---
href="http://go.vg.no/cgi-bin/go.cgi/meny/http://elisabeth.vgb.no/">Frue
på veggen (blogg)</a></li>
---
As far as I can tell non-ascii characters appear correctly decoded.
Apparently you were printing the output of your application to a console
that simply could not handle non-ascii characters.
This whole story was a non-issue from the very beginning.
Oleg
------------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]