Hi Hannes,
After looking again at your issue with Thierry, we concluded that there
is no bug, just some classical characters encoding confusion :-).
What happens is probably this: when you use your plain socket, your
objects are serialized to XML, which will contain metadata about the
encoding used by the client. When the server receives it and deserialize
the XML, it can decode it to the local character encoding when printing
its content.
Now, with your Restlet approach, the object is serialized using Java
serialization (binary scheme). When it gets deserialized, it restores
strings in the JVM as UTF-16 (Java's internal encoding for strings).
When you print those strings to your console, there is an issue because
the console expects another encoding (ISO-8859-1) and has no way to
automatically convert your UTF string.
One solution is to change the encoding of your console to UTF-16. The
other is to convert your string to your local encoding before printing.
You could use the java.io.OutputStreamWriter for this purpose, wrapping
the System.out stream and passing ISO-8859-1 as the encoding.
Some text editors are smart enough and can detect the encoding of a text
file. That's probably why Gedit works for you. I hope this clarified the
issue. We are closing issue#525 now.
Best regards,
Jérôme Louvel
--
Restlet ~ Founder and Lead developer ~ http://www.restlet.org
<http://www.restlet.org/>
Noelios Technologies ~ Co-founder ~ http://www.noelios.com
Hannes Ebner a écrit :
Hi Thierry,
Thierry Boileau wrote:
I had a look at the issue and I don't see what's wrong. I was able to
send a serialized object from a client using UTF-8 to a server using
ISO-8859-1 without encoding issues.
Could you send us a reproductible test case, and send us also the trace
of the following code on both client and server side?
I planned to write a reproducable test case, but I didn't get that far.
I tried your small application first (the one attached to the bug
report) and ran it directly on the server which uses ISO-8859-1.
The server sent some data to itself and printed it on the console, so I
guess there is no change in the encoding involved.
The result that I got on the console was:
Une cha�ne de caract�res.
I could also reproduce it on another server with ISO-8859-1, but not on
a third one which was configured for UTF-8.
With UTF-8 I got the correct string:
Une chaîne de caractères.
I also tried to pipe the console output into a file, which I transferred
to my development machine (which uses UTF-8). A simple cat of this file
on the console showed question marks like above, but when I opened the
same file on the same machine with a graphical editor (gedit), the
special characters showed up correctly. I'm confused now, it seems to be
an encoding issue which might not even be related to restlets.
The questions are now: how do I solve it, and why does it work to
transfer the special characters with the old version of my service,
which does not use restlets (it uses Object-to-XML serialization over a
socket).
I also attached the requested system properties to this mail, one file
with the settings from an ISO-8859-1 server, the other one with UTF-8. I
don't know whether it helps you to find something, I couldn't see
anything strange.
Perhaps you have or somebody else on the list has some experience with
problems related to character encodings, I'm out of ideas right now.
Best regards,
Hannes