Hi Hannes,

After looking again at your issue with Thierry, we concluded that there is no bug, just some classical characters encoding confusion :-).

What happens is probably this: when you use your plain socket, your objects are serialized to XML, which will contain metadata about the encoding used by the client. When the server receives it and deserialize the XML, it can decode it to the local character encoding when printing its content.

Now, with your Restlet approach, the object is serialized using Java serialization (binary scheme). When it gets deserialized, it restores strings in the JVM as UTF-16 (Java's internal encoding for strings). When you print those strings to your console, there is an issue because the console expects another encoding (ISO-8859-1) and has no way to automatically convert your UTF string.

One solution is to change the encoding of your console to UTF-16. The other is to convert your string to your local encoding before printing. You could use the java.io.OutputStreamWriter for this purpose, wrapping the System.out stream and passing ISO-8859-1 as the encoding.

Some text editors are smart enough and can detect the encoding of a text file. That's probably why Gedit works for you. I hope this clarified the issue. We are closing issue#525 now.

Best regards,
Jérôme Louvel
--
Restlet ~ Founder and Lead developer ~ http://www.restlet.org <http://www.restlet.org/>
Noelios Technologies ~ Co-founder ~ http://www.noelios.com



Hannes Ebner a écrit :
Hi Thierry,

Thierry Boileau wrote:
I had a look at the issue and I don't see what's wrong. I was able to
send a serialized object from a client using UTF-8 to a server using
ISO-8859-1 without encoding issues.
Could you send us a reproductible test case, and send us also the trace
of the following code on both client and server side?

I planned to write a reproducable test case, but I didn't get that far.
I tried your small application first (the one attached to the bug
report) and ran it directly on the server which uses ISO-8859-1.

The server sent some data to itself and printed it on the console, so I
guess there is no change in the encoding involved.

The result that I got on the console was:

Une cha�ne de caract�res.

I could also reproduce it on another server with ISO-8859-1, but not on
a third one which was configured for UTF-8.

With UTF-8 I got the correct string:

Une chaîne de caractères.

I also tried to pipe the console output into a file, which I transferred
to my development machine (which uses UTF-8). A simple cat of this file
on the console showed question marks like above, but when I opened the
same file on the same machine with a graphical editor (gedit), the
special characters showed up correctly. I'm confused now, it seems to be
an encoding issue which might not even be related to restlets.

The questions are now: how do I solve it, and why does it work to
transfer the special characters with the old version of my service,
which does not use restlets (it uses Object-to-XML serialization over a
socket).

I also attached the requested system properties to this mail, one file
with the settings from an ISO-8859-1 server, the other one with UTF-8. I
don't know whether it helps you to find something, I couldn't see
anything strange.

Perhaps you have or somebody else on the list has some experience with
problems related to character encodings, I'm out of ideas right now.

Best regards,
Hannes

Reply via email to