Magnus Olstad Hansen wrote:

Hello Oleg,

Sorry - I only tried what seemed easiest to me first here. Today is the first day I've had the time to look into this again. A wire + context log should be attached to this mail. Hope this can clearify what is going on.

PS! I'm pretty sure that EntityUtils.getContentCharset() returns the right charset (UTF-8 in this case) - so the confusing point is why EntityUtils.toString() returns 0x3F for the norwegian letters. As far as I could tell from the sources the charset from getContentCharset() has top priority in toString()...

Thanks for helping,
Magnus


That's what I am seeing in the wire log

---
FINE: << "[0x9][0x9]<li><a href="http://go.vg.no/cgi-bin/go.cgi/meny/http://elisanett.vgb.no/";>Skjermdump (blogg)</a></li>[\n]" FINE: << "[0x9][0x9]<li><a href="http://go.vg.no/cgi-bin/go.cgi/meny/http://elisabeth.vgb.no/";>Frue p[0xc3][0xa5] veggen (blogg)</a></li>[\n]"
---

As far as I can tell the content is correctly encoded as UTF-8


This is a snippet of output produced using EntityUtils#toString
---
href="http://go.vg.no/cgi-bin/go.cgi/meny/http://elisabeth.vgb.no/";>Frue på veggen (blogg)</a></li>
---

As far as I can tell non-ascii characters appear correctly decoded.

Apparently you were printing the output of your application to a console that simply could not handle non-ascii characters.

This whole story was a non-issue from the very beginning.

Oleg








------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to