[DISCUSSION] High-volume counters in Cassandra

Johan Oskarsson Thu, 02 Sep 2010 12:02:10 -0700

In the last few months Digg and Twitter have been using a counter patch that 
lets Cassandra act as a high-volume realtime counting system. Atomic counters 
enable new applications that were previously difficult to implement at scale, 
including realtime analytics and large-scale systems monitoring.


Discussion
There are currently two different suggestions for how to implement counters in 
Cassandra. The discussion has so far been limited to those following the jiras 
(CASSANDRA-1072 and CASSANDRA-1421) closely and we don’t seem to be nearing a 
decision. I want to open it up to the Cassandra community at large to get 
additional feedback.

Below are very basic and brief introductions to the alternatives. Please help 
us move forward by reading through the docs and jiras and reply to this thread 
with your thoughts. Would one or the other, both or neither be suitable for 
inclusion in Cassandra? Is there a third option? What can we do to reach a 
decision?

We believe that both options can coexist; their strengths and weaknesses make 
them suitable for different use cases.


CASSANDRA-1072 + CASSANDRA-1397
https://issues.apache.org/jira/browse/CASSANDRA-1072 (see design doc)
https://issues.apache.org/jira/browse/CASSANDRA-1397

How does it work?
A node is picked as the primary replica for each write. The context byte array 
for a column contains (primary replica ip, value). Any previous data with the 
same ip is reconciled with the new increment and put as the column value.

Concerns raised
* an increment in flight will be lost if the wrong node goes down
* if an increment operation times out it’s impossible to know if it has been 
executed or not

The most recent jira comment proposes a new API method for increments that 
reflects the different consistency level guarantees.


CASSANDRA-1421
https://issues.apache.org/jira/browse/CASSANDRA-1421

How does it work?
Each increment for a counter is stored as a (UUID, value) tuple. The read 
operations will read all these increment tuples for a counter, reconcile and 
return. On a regular interval the values are all read and reconciled into one 
value to reduce the amount of data required for each read operation. 

Concerns raised
* poor read performance, especially for time-series data
* post aggregation reconciliation issues


Again, we feel that both options can co-exist, especially if the 1072 patch 
uses a new API method that reflects its different consistency level guarantees. 
Our proposal is to accept 1072 into trunk with the new API method, and when an 
implementation of 1421 is completed it can be accepted alongside.

[DISCUSSION] High-volume counters in Cassandra

Reply via email to