Re: [DISCUSSION] High-volume counters in Cassandra

Jeremy Hanna Fri, 03 Sep 2010 16:16:06 -0700

So ditch Clocks and refactor to be more cleanly separated and it could go in?


On Sep 2, 2010, at 3:55 PM, Jonathan Ellis wrote:

> I still have not seen any response to my other misgivings about 1072
> that I have raised on the ticket.  Specifically, the existing patch is
> based around a Clock structure that, since 580 is a dead end, is no
> longer necessary.
> 
> I'm also uneasy about adding 200k of code that meshes as poorly with
> the rest of Cassandra as this does.  The more it can be split off into
> separate code paths, the better.  Adding its own thrift method is a
> good start, but it should go deeper than that.
> 
> On Thu, Sep 2, 2010 at 12:01 PM, Johan Oskarsson <jo...@oskarsson.nu> wrote:
>> In the last few months Digg and Twitter have been using a counter patch that 
>> lets Cassandra act as a high-volume realtime counting system. Atomic 
>> counters enable new applications that were previously difficult to implement 
>> at scale, including realtime analytics and large-scale systems monitoring.
>> 
>> Discussion
>> There are currently two different suggestions for how to implement counters 
>> in Cassandra. The discussion has so far been limited to those following the 
>> jiras (CASSANDRA-1072 and CASSANDRA-1421) closely and we don’t seem to be 
>> nearing a decision. I want to open it up to the Cassandra community at large 
>> to get additional feedback.
>> 
>> Below are very basic and brief introductions to the alternatives. Please 
>> help us move forward by reading through the docs and jiras and reply to this 
>> thread with your thoughts. Would one or the other, both or neither be 
>> suitable for inclusion in Cassandra? Is there a third option? What can we do 
>> to reach a decision?
>> 
>> We believe that both options can coexist; their strengths and weaknesses 
>> make them suitable for different use cases.
>> 
>> 
>> CASSANDRA-1072 + CASSANDRA-1397
>> https://issues.apache.org/jira/browse/CASSANDRA-1072 (see design doc)
>> https://issues.apache.org/jira/browse/CASSANDRA-1397
>> 
>> How does it work?
>> A node is picked as the primary replica for each write. The context byte 
>> array for a column contains (primary replica ip, value). Any previous data 
>> with the same ip is reconciled with the new increment and put as the column 
>> value.
>> 
>> Concerns raised
>> * an increment in flight will be lost if the wrong node goes down
>> * if an increment operation times out it’s impossible to know if it has been 
>> executed or not
>> 
>> The most recent jira comment proposes a new API method for increments that 
>> reflects the different consistency level guarantees.
>> 
>> 
>> CASSANDRA-1421
>> https://issues.apache.org/jira/browse/CASSANDRA-1421
>> 
>> How does it work?
>> Each increment for a counter is stored as a (UUID, value) tuple. The read 
>> operations will read all these increment tuples for a counter, reconcile and 
>> return. On a regular interval the values are all read and reconciled into 
>> one value to reduce the amount of data required for each read operation.
>> 
>> Concerns raised
>> * poor read performance, especially for time-series data
>> * post aggregation reconciliation issues
>> 
>> 
>> Again, we feel that both options can co-exist, especially if the 1072 patch 
>> uses a new API method that reflects its different consistency level 
>> guarantees. Our proposal is to accept 1072 into trunk with the new API 
>> method, and when an implementation of 1421 is completed it can be accepted 
>> alongside.
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com

Re: [DISCUSSION] High-volume counters in Cassandra

Reply via email to