Alan Conway wrote:
On 05/06/2010 04:39 PM, Rafael Schloming wrote:
Jonathan Robie wrote:
Java, C++, and Python each have a different understanding of what a
string is, what a character is, and how they approach Unicode.
If I place a string or a character in a map using one language, and
read it back in another language, what do I get for each language?
Does this depend at all on the platform?
In maps, do these types round-trip? Suppose a sender creates a char or
a string, and sends it to a client in another language, then the
receiver reads it and sends it back. Can the original sender compare
the string it sent to the one it received and expect it to always be
equal?
This won't work with characters since python doesn't have a character
type. Any characters will come out as 1 character long unicode strings,
which means when you read them back you'll get a unicode string rather
than a character.
In general if you stick to the "unicode" type in python and the "String"
type in Java you should be able to safely round-trip between the two. I
don't know how this will interact with the C++ map message API. I know
at one point unicode inside a map would come out as raw bytes on the C++
side, however I don't know if this is still true.
All strings handled by qpid in c++ are raw bytes. Qpid libraries do not
gratuitously modify strings for unicode or any other reasons.
Applications may chose to use UTF-8 or any other encoding they like, the
qpid libs are neutral.
I don't think this is actually true. At least I don't see how C++ maps
could possibly work properly if it were true.
When AMQP encodes a string inside a map, the type-code on the wire
explicitly indicates whether it is UTF-8, UTF-16, etc. This means that
the C++ library can't just pass raw bytes back to the application for a
string encoded inside a map, at a minimum it needs to tell the
application what format those bytes are.
Likewise, when the application passes data to the library to be encoded,
it needs to somehow indicate what the encoding for that data is (whether
it is UTF-8, UTF-16, or any of the other variants of unicode allowed).
What I was saying above is that I don't think the C++ API will
automatically decode the raw bytes into a wide string for you, but it
has to at least give you enough information to do that yourself,
otherwise it is essentially throwing information away and won't
interoperate correctly.
--Rafael
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]