The current sstable format has very high per-cell overhead.  For small
values this can easily be 5x or more.  Compression can mitigate this
on disk but we still pay in CPU for every read and write.

For many workloads this is acceptable, but for others (e.g. high
performance time series) it is not.  While Cassandra does better than
other nosql systems like mongodb or couchbase, it is not competitive
with things like kdb in the financial space.

Benedict posted a high level outline at [1] of how we can rebuild
sstables with a hybrid columnar storage format, allowing very low
per-cell overhead.  (Basically, value + compressed timestamp.)  I'd
like to target this for 3.0, but to do that I'd like to get consensus
on our implementation plan first.  This is not a small change and the
closer we can get to agreement on the approach before starting to put
patches together, the better.

Please post your feedback to jira over the next week and we will
incorporate that into an implementation proposal that breaks it down
into discrete, reviewable pieces.

[1] https://issues.apache.org/jira/browse/CASSANDRA-7447

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Reply via email to