In CASSANDRA-1546, I propose an alternative to #1072. At it's core, it rewrites #1072 without the clocks structure (by splitting the clock into individual columns, not unlike what Zhu Han proposed in his preceding mail, but in a row instead of a super column, for reason explained in the issue).
But it is also my belief that it improves on the actual patch of #1072 in the following ways: - it supports increments and decrements - it supports the usual consistency levels - it proposes an (optional) solution to the idempotency problem of increments (it's optional because it has a (fairly slight) performance cost that some may want to remove if they understand the risk). When I say, I propose, I mean that I did wrote the patch (attached to the jira ticket). I've just written it, so it is really under-tested and have a few details here and there to fix, but it should already be fairly functional (it passes basic system tests). I welcome all comments on the patch. It has been written with in mind the goal to address most of the concerns that have been addressed on those counters since a few months (both in terms of performance and implementation). It is my belief that is reaches this goal, hopefully other will agree. -- Sylvain On Mon, Sep 27, 2010 at 5:32 AM, Zhu Han <schumi....@gmail.com> wrote: > I propose a new way to solve the counter problem in cassandra-1502[1]. > Since I do not follow the jira update very carefully, I paste it here and > want to let more people comment it and then to see whether its feasible. > > "Seems like we have not found a solution acceptable to everybody. I tries to > propose a new approach. Let's see whether anybody can shed some light on it > and make it as reality. > > 1) We add a basic data structure, called as counter, which is a special type > of super column. > > 2) The name of each column in the counter super column, is the host name of > a cassandra node. And the value is the calculated result from that node. > > 3) WRITE PATH: Once a node receives the add/dec request of a counter, it > de-serializes its local counter super column, and update the column named by > itself atomically. After that, it propagates the updated column value to > other replicas, just like how the mutation of a normal column is propagated > to other replicas. Different consistency levels can be supported as before. > > 4) READ PATH: Depends on the consistency level, contact several replicas, > read back the counter super column as whole, and get the latest counter > value by summing up all columns in the counter. Read-repair logic can work > as before. > > IMHO, the biggest advantages of this approach, is re-using as many > mechanisms already in the code as possible. So it might not so disruptive. > But adding new thrift API is inevitable. " > NB: If it's feasible, I might not be the right man working on it as I have > not touched the internal of cassandra for more than 1 year. I wants to > contribute something to help us get consensus. > > [1] > https://issues.apache.org/jira/browse/CASSANDRA-1502?focusedCommentId=12915103&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12915103 > > best regards, > hanzhu > > > On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> you have misunderstood. if we continue the 1072 approach of writing >> counter data to the clock field, this is necessarily incompatible with >> the right way of writing counter data to the value field. it's no >> longer simply a matter of reversing 1070. >> >> On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han <schumi....@gmail.com> wrote: >> > Jonathan, >> > >> > This is a personnel email. >> > >> > On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis <jbel...@gmail.com> >> wrote: >> >> >> >> On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han <schumi....@gmail.com> wrote: >> >> > Can we just let the patch committed but mark it as "alpah" or >> >> > "experimental"? >> >> >> >> I explained exactly why that is not a good approach here: >> >> http://www.mail-archive.com/dev@cassandra.apache.org/msg00917.html >> >> >> > Yes, I see. But the clock structure is in truck since Cassandra-1070. We >> > still need to clean them >> > out, whatever. We need somebody to be volunteer to take this work. >> > Considering the complexity >> > of Cassandra-1070, the programmer who has the in depth knowledge of this >> > patch is preferable. And it >> > will take some time to do it. >> > >> > Fortunately, Johan Oskarsson has promised to take it in the comment of >> > Cassandra-1072[1]: >> > >> > "The clock changes would get into trunk quicker if we didn't, avoiding >> the >> > extra overhead of a big patch during reviews, merge with trunk, code >> updates >> > and publication of a new patch. >> > If the concern is that we won't attend to the clocks once this patch is >> in I >> > can promise that we'll look at it straight away. " >> > >> > And if twitter/digg/simplegeo forks their tree of cassandra, this will >> give >> > a big marketing opportunities of other NOSQL system supporters. As you >> know, >> > the competition is quite fierce currently. >> > >> > So, instead of sticking to the embarrassed situation, why not change to >> > another strategy: >> > >> >> "Fork another experimental tree from 0.7 beta 1 and accept >> >> Cassandra-1072. At the same time, start the clean up work on this tree. >> >> Once it's finalized , merge them back to 0.7, no matter it's 0.7.1 or >> 0.7.2. >> >> >> >> Hence, these guys from twitter does not need to maintain a huge >> >> out-of-tree patch, while the quality impact of cassandra-1072 is still >> >> limited. >> > >> > I do know the pain of maintaining a large patch out of the official tree. >> > Once it gets in, everybody will feels much better. >> > >> > If you give some opportunities to this patch, Johan or others can be >> highly >> > motivated because all of the community works together. It's a >> compromise, >> > but it's worth. >> > >> > [1] >> > >> https://issues.apache.org/jira/browse/CASSANDRA-1072?focusedCommentId=12909234&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12909234 >> > >> > >> >> >> >> -- >> >> Jonathan Ellis >> >> Project Chair, Apache Cassandra >> >> co-founder of Riptano, the source for professional Cassandra support >> >> http://riptano.com >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> >