Re: [DISCUSSION] High-volume counters in Cassandra

Torsten Curdt Thu, 02 Sep 2010 16:11:07 -0700

I cannot say anything about the implementation details of the patch or
even the two different approaches. Not sure that even matters that
much at this stage. What can say though is that I got the feeling that
there is a lot of desire and drive in the community to get at least
something in. Ignoring this one has a particular high fork risk IMO.
Especially if the changes are deep and the patch big this sounds like
painful situation for everyone involved.


I understand it adds a contract to maintain and the exact approach is
not set in stone. But I also like the idea of adding both approaches
like Johan suggested. The feature could still be marked experimental.
That should loosen the contract a little. But at least it would be
something to work with.

Another option would be to create a "counter" branch - which of course
also needs to be maintained and merged. But this is still better than
having everyone build and maintain their own set of patches.

Of course the community needs to agree on the general vision first -
that counters would be a great thing to have in Cassandra. If that is
the case ... well ... then let's move into that direction.

My 2 cents
--
Torsten

On Thu, Sep 2, 2010 at 23:46, Adam Samet <adam.sa...@gmail.com> wrote:
> If a new api method is added for counters, the thrift interface Clock
> structure wouldn't be needed, but that's getting to be an
> implementation detail.  Whether 1072 is an appropriate step forward is
> tangential to that issue.
>
> The patch has been refactored several times based on JIRA feedback.
> If the objections are code level, perhaps it's time to take this to
> code review to address specific concerns.
>
> On Thu, Sep 2, 2010 at 1:55 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>> I still have not seen any response to my other misgivings about 1072
>> that I have raised on the ticket.  Specifically, the existing patch is
>> based around a Clock structure that, since 580 is a dead end, is no
>> longer necessary.
>>
>> I'm also uneasy about adding 200k of code that meshes as poorly with
>> the rest of Cassandra as this does.  The more it can be split off into
>> separate code paths, the better.  Adding its own thrift method is a
>> good start, but it should go deeper than that.
>>
>> On Thu, Sep 2, 2010 at 12:01 PM, Johan Oskarsson <jo...@oskarsson.nu> wrote:
>>> In the last few months Digg and Twitter have been using a counter patch 
>>> that lets Cassandra act as a high-volume realtime counting system. Atomic 
>>> counters enable new applications that were previously difficult to 
>>> implement at scale, including realtime analytics and large-scale systems 
>>> monitoring.
>>>
>>> Discussion
>>> There are currently two different suggestions for how to implement counters 
>>> in Cassandra. The discussion has so far been limited to those following the 
>>> jiras (CASSANDRA-1072 and CASSANDRA-1421) closely and we don’t seem to be 
>>> nearing a decision. I want to open it up to the Cassandra community at 
>>> large to get additional feedback.
>>>
>>> Below are very basic and brief introductions to the alternatives. Please 
>>> help us move forward by reading through the docs and jiras and reply to 
>>> this thread with your thoughts. Would one or the other, both or neither be 
>>> suitable for inclusion in Cassandra? Is there a third option? What can we 
>>> do to reach a decision?
>>>
>>> We believe that both options can coexist; their strengths and weaknesses 
>>> make them suitable for different use cases.
>>>
>>>
>>> CASSANDRA-1072 + CASSANDRA-1397
>>> https://issues.apache.org/jira/browse/CASSANDRA-1072 (see design doc)
>>> https://issues.apache.org/jira/browse/CASSANDRA-1397
>>>
>>> How does it work?
>>> A node is picked as the primary replica for each write. The context byte 
>>> array for a column contains (primary replica ip, value). Any previous data 
>>> with the same ip is reconciled with the new increment and put as the column 
>>> value.
>>>
>>> Concerns raised
>>> * an increment in flight will be lost if the wrong node goes down
>>> * if an increment operation times out it’s impossible to know if it has 
>>> been executed or not
>>>
>>> The most recent jira comment proposes a new API method for increments that 
>>> reflects the different consistency level guarantees.
>>>
>>>
>>> CASSANDRA-1421
>>> https://issues.apache.org/jira/browse/CASSANDRA-1421
>>>
>>> How does it work?
>>> Each increment for a counter is stored as a (UUID, value) tuple. The read 
>>> operations will read all these increment tuples for a counter, reconcile 
>>> and return. On a regular interval the values are all read and reconciled 
>>> into one value to reduce the amount of data required for each read 
>>> operation.
>>>
>>> Concerns raised
>>> * poor read performance, especially for time-series data
>>> * post aggregation reconciliation issues
>>>
>>>
>>> Again, we feel that both options can co-exist, especially if the 1072 patch 
>>> uses a new API method that reflects its different consistency level 
>>> guarantees. Our proposal is to accept 1072 into trunk with the new API 
>>> method, and when an implementation of 1421 is completed it can be accepted 
>>> alongside.
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>

Re: [DISCUSSION] High-volume counters in Cassandra

Reply via email to