+ 1 for Jonathan Ellis. I might not be on the same page as you active community members. But I'm wondering why not put this feature to a popular client library or as a contrib package?
In CASSANDRA-1072 + CASSANDRA-1397, the increment of counter is not idempotent, so it's difficult to align with the consistency model of Cassandra. It's not worth to put a lot of code to the core base to just serve a single feature. In CASSANDRA-1421, the increment is idempotent and is easier to align with Cassandra. However, the read performance could be poor because it has to reconcile a lot of columns. The memory consumption on cassandra node might be much higher than the above approach, if I understood it correctly. If you decides to put the feature to the client library. The client library can take the approach as CASSANDRA-142, and serialize the increment from a single writer to limit the columns generated. If the writers of a single counter are just hundreds processes, I don't think it is a big deal for performance. If you worry about the performance on the client side because it serialize the increment of a single counter, maintain a queue for each counter and it's easy to batch multiple updates in the same queue. best regards, hanzhu On Fri, Sep 3, 2010 at 4:55 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > I still have not seen any response to my other misgivings about 1072 > that I have raised on the ticket. Specifically, the existing patch is > based around a Clock structure that, since 580 is a dead end, is no > longer necessary. > > I'm also uneasy about adding 200k of code that meshes as poorly with > the rest of Cassandra as this does. The more it can be split off into > separate code paths, the better. Adding its own thrift method is a > good start, but it should go deeper than that. > > On Thu, Sep 2, 2010 at 12:01 PM, Johan Oskarsson <jo...@oskarsson.nu> > wrote: > > In the last few months Digg and Twitter have been using a counter patch > that lets Cassandra act as a high-volume realtime counting system. Atomic > counters enable new applications that were previously difficult to implement > at scale, including realtime analytics and large-scale systems monitoring. > > > > Discussion > > There are currently two different suggestions for how to implement > counters in Cassandra. The discussion has so far been limited to those > following the jiras (CASSANDRA-1072 and CASSANDRA-1421) closely and we don’t > seem to be nearing a decision. I want to open it up to the Cassandra > community at large to get additional feedback. > > > > Below are very basic and brief introductions to the alternatives. Please > help us move forward by reading through the docs and jiras and reply to this > thread with your thoughts. Would one or the other, both or neither be > suitable for inclusion in Cassandra? Is there a third option? What can we do > to reach a decision? > > > > We believe that both options can coexist; their strengths and weaknesses > make them suitable for different use cases. > > > > > > CASSANDRA-1072 + CASSANDRA-1397 > > https://issues.apache.org/jira/browse/CASSANDRA-1072 (see design doc) > > https://issues.apache.org/jira/browse/CASSANDRA-1397 > > > > How does it work? > > A node is picked as the primary replica for each write. The context byte > array for a column contains (primary replica ip, value). Any previous data > with the same ip is reconciled with the new increment and put as the column > value. > > > > Concerns raised > > * an increment in flight will be lost if the wrong node goes down > > * if an increment operation times out it’s impossible to know if it has > been executed or not > > > > The most recent jira comment proposes a new API method for increments > that reflects the different consistency level guarantees. > > > > > > CASSANDRA-1421 > > https://issues.apache.org/jira/browse/CASSANDRA-1421 > > > > How does it work? > > Each increment for a counter is stored as a (UUID, value) tuple. The read > operations will read all these increment tuples for a counter, reconcile and > return. On a regular interval the values are all read and reconciled into > one value to reduce the amount of data required for each read operation. > > > > Concerns raised > > * poor read performance, especially for time-series data > > * post aggregation reconciliation issues > > > > > > Again, we feel that both options can co-exist, especially if the 1072 > patch uses a new API method that reflects its different consistency level > guarantees. Our proposal is to accept 1072 into trunk with the new API > method, and when an implementation of 1421 is completed it can be accepted > alongside. > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >