Ok - so my understanding from reading the two jira issues is that python and ruby treat the "string" thrift type as unencoded bytes whereas java treats them as utf-8 encoded bytes. What was the rationale behind declaring keys to be of type "string" rather than of type "binary"? With "binary", presumably java wouldn't treat keys as utf-8 encoded bytes.
Edmond On Mon, Dec 7, 2009 at 3:09 PM, Jonathan Ellis <[email protected]> wrote: > I suspect you will need to explicitly encode to UTF8 first, then. > (And decode when reading.) > > My reading of the relevant issues > (https://issues.apache.org/jira/browse/THRIFT-395, > https://issues.apache.org/jira/browse/THRIFT-414) is that this won't > be fixed any time soon. > > -Jonathan > > On Mon, Dec 7, 2009 at 4:56 PM, Edmond Lau <[email protected]> wrote: >> This particular client was in Ruby. >> >> On Mon, Dec 7, 2009 at 2:49 PM, Jonathan Ellis <[email protected]> wrote: >>> (bugs in thrift, that is) >>> >>> On Mon, Dec 7, 2009 at 4:49 PM, Jonathan Ellis <[email protected]> wrote: >>>> what language are your clients in? there are definitely some bugs >>>> there when communicating b/t client and server of different languages. >>>> :( >>>> >>>> On Mon, Dec 7, 2009 at 4:43 PM, Edmond Lau <[email protected]> wrote: >>>>> I'm using non-ascii keys on Cassandra, relatively close to trunk at >>>>> r880926, and my some of my keys get mangled. >>>>> >>>>> As a simple test case, if I insert a one-byte key anywhere between >>>>> \200 and \377 (octal for 128 to 255) through the thrift interface, and >>>>> then query back my data with multi get, I get a hash back that has >>>>> "\357\277\275" as the key. All those one-byte keys get mapped to the >>>>> same bucket, so if I insert with the key \205, I get the data back >>>>> when querying for \300. So either a) there's a bug in thrift, b) >>>>> Cassandra doesn't support non-ascii keys, or c) Cassandra is mangling >>>>> my key somewhere. >>>>> >>>>> Has anyone else run into this issue? >>>>> >>>>> Edmond >>>>> >>>> >>> >> >
