Hello,

While creating the data model for my application I came across the need to
be able to submit multiple slice predicates per row (for reasons I can
explain but I believe they are unrelated to the topic). I started digging
around the thrift api code and eventually realized that it could be
simplified while adding the ability to execute multiple row and column
slices at once.

I am proposing to replace all get operations with a single operation (I'm
going to call it get_slice for now):

map<binary, list<ColumnOrSuperColumn>> get_slice(list<KeyRange> keys,
ColumnParent column_parent, list<SlicePredicate> slices,
list<IndexExpression> index_clause, ConsistencyLevel consistency_level)

and the count equivalent:

map<binary, int32> get_slice_count(list<KeyRange> keys, ColumnParent
column_parent, list<SlicePredicate> slices, list<IndexExpression>
index_clause, ConsistencyLevel consistency_level)

Return value may also be list<KeySlice> (and the corresponding
list<KeyCount>) if the order of returned rows is important. It might be
important to keep rows in partitioner order (key order is easily
constructed on the client using TreeMap or equivalent), I do not know
enough to make that call.

For backwards compatibility existing client methods can be easily built on
top of the new method (i.e. get would become a get_slice with a single
KeyRange with the specified key being both stand and end key, a single
SlicePredicate with the specified column and null IndexClause).

Internally existing read paths would be consolidated as well. In
CassandraServer each KeyPredicate would generate a RangeSliceCommand which
would be extended (most likely by versioning similar to how ReadCommand is
subclassed) to allow multiple slice predicates. That in turn would require
QueryFilter to be extended to allow multiple slice predicates. Any code
past that (ColumnFamilyStore) will be unaffected (with calls to either
search() or getRangeSlice() now based on whether IndexClause was supplied
and there is an index for it).

All other existing read commands (SliceByNamesReadCommand,
SliceFromReadCommand and so on) would be implemented on top
of RangeSliceCommand to preserve backwards compatibility while allowing the
removal of StorageProxy.read() and fetchRows(). These methods are called
directly by cql code but those calls could also be changed to effectively
execute range slices instead.

I understand this looks like a huge change but I believe that the
simplification coming from having a single read command while adding the
flexibility of multiple key and column predicates in a single call is worth
it.

Does this seem worth doing? I would appreciate your feedback on this.

Regards

Jerry P

Reply via email to