Evan Weaver wrote:
I wanted to start a small discussion to see if there is any interest
in supporting alternative wire protocols or perhaps junking Thrift to
some degree.
Some options:
* Use JSON over HTTP
* Use BSON over...something (http://www.mongodb.org/display/DOCS/BSON)
* Use ASN.1 over...something
* Use Protocol Buffers over...something
* Use Thrift, but package Cassandra-specific clients for each language
I have not thought too coherently about this but generic Thrift seems
to be a pain point for everybody.
Hi Evan,
I've been playing around again with Cassandra recently and I agree
Thrift is a pain point, and that was the case when I looked at the
project originally. But I think it's not so much Thrift as how the data
is presented to clients.
Much more important to me is that to use Cassandra means reading and
understanding the service api calls in cassandra.thrift. Personally I
wouldn't have designed a fine grained API over the generic data
structures implied by a colum store, where simple filters and selects
become a litany of get_by_X calls. For example, 4 methods return
list<column_t>, 2 return list<string>, 2 return list<superColumn_t>,
there are 5 get_slice and 4 get_column variants. And typical of RPC,
none of this stuff composes. In something like Django there are chained
filter() calls (Hibernate has similar Criteria calls) which makes for a
stable programming API, where what you need to figure out the criteria
to pass. With Cassandra you have to do that and find the right method;
the API surface is much bigger. Simple keystores and dynamo style models
get away with fine grained RPC as there's nothing much to do except the
key lookup and multiget usecases. They're not a design sweetspot for
column stores APIs imvho.
I think the question for Cassandra is not so much about serialization
techniques and speed as whether RPC is the best way to expose the data.
Bill