Re: Portable Strings, Characters in Maps

Rafael Schloming Thu, 06 May 2010 15:28:31 -0700

Alan Conway wrote:

On 05/06/2010 04:39 PM, Rafael Schloming wrote:

Jonathan Robie wrote:

Java, C++, and Python each have a different understanding of what a
string is, what a character is, and how they approach Unicode.


If I place a string or a character in a map using one language, and
read it back in another language, what do I get for each language?
Does this depend at all on the platform?

In maps, do these types round-trip? Suppose a sender creates a char or
a string, and sends it to a client in another language, then the
receiver reads it and sends it back. Can the original sender compare
the string it sent to the one it received and expect it to always be
equal?


This won't work with characters since python doesn't have a character
type. Any characters will come out as 1 character long unicode strings,
which means when you read them back you'll get a unicode string rather
than a character.

In general if you stick to the "unicode" type in python and the "String"
type in Java you should be able to safely round-trip between the two. I
don't know how this will interact with the C++ map message API. I know
at one point unicode inside a map would come out as raw bytes on the C++
side, however I don't know if this is still true.

All strings handled by qpid in c++ are raw bytes. Qpid libraries do notgratuitously modify strings for unicode or any other reasons.Applications may chose to use UTF-8 or any other encoding they like, theqpid libs are neutral.

I don't think this is actually true. At least I don't see how C++ mapscould possibly work properly if it were true.

When AMQP encodes a string inside a map, the type-code on the wireexplicitly indicates whether it is UTF-8, UTF-16, etc. This means thatthe C++ library can't just pass raw bytes back to the application for astring encoded inside a map, at a minimum it needs to tell theapplication what format those bytes are.

Likewise, when the application passes data to the library to be encoded,it needs to somehow indicate what the encoding for that data is (whetherit is UTF-8, UTF-16, or any of the other variants of unicode allowed).

What I was saying above is that I don't think the C++ API willautomatically decode the raw bytes into a wide string for you, but ithas to at least give you enough information to do that yourself,otherwise it is essentially throwing information away and won'tinteroperate correctly.


--Rafael


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Re: Portable Strings, Characters in Maps

Reply via email to