Re: [DISCUSSION] High-volume counters in Cassandra

Ryan King Tue, 28 Sep 2010 12:25:36 -0700

Sorry, been catching up on this.

>From Twitter's perspective, 1546 is probably insufficient because it
doesn't allow one to do time-series data without supercolumns (which
might work ok, but require a good deal of work). Additionally, one of
our deployed systems already does supercolumns of counters, which is
not feasible in this design at all.


-ryan

On Tue, Sep 28, 2010 at 10:12 AM, Jeremy Hanna
<jeremy.hanna1...@gmail.com> wrote:
> Is there any feedback from Twitter and Digg and perhaps SimpleGeo people 
> about CASSANDRA-1546?  Would that work so that you wouldn't have to maintain 
> a fork?
>
> On Sep 27, 2010, at 5:25 AM, Sylvain Lebresne wrote:
>
>> In CASSANDRA-1546, I propose an alternative to #1072. At it's core,
>> it rewrites #1072 without the clocks structure (by splitting the clock into
>> individual columns, not unlike what Zhu Han proposed in his preceding
>> mail, but in a row instead of a super column, for reason explained in the
>> issue).
>>
>> But it is also my belief that it improves on the actual patch of #1072 in
>> the following ways:
>>  - it supports increments and decrements
>>  - it supports the usual consistency levels
>>  - it proposes an (optional) solution to the idempotency problem of
>>    increments (it's optional because it has a (fairly slight) performance 
>> cost
>>    that some may want to remove if they understand the risk).
>>
>> When I say, I propose, I mean that I did wrote the patch (attached to the 
>> jira
>> ticket). I've just written it, so it is really under-tested and have a
>> few details here
>> and there to fix, but it should already be fairly functional (it
>> passes basic system
>> tests).
>>
>> I welcome all comments on the patch. It has been written with in mind
>> the goal to
>> address most of the concerns that have been addressed on those counters 
>> since a
>> few months (both in terms of performance and implementation). It is my
>> belief that
>> is reaches this goal, hopefully other will agree.
>>
>> --
>> Sylvain
>>
>> On Mon, Sep 27, 2010 at 5:32 AM, Zhu Han <schumi....@gmail.com> wrote:
>>>  I propose a new way to solve the counter problem in cassandra-1502[1].
>>> Since I do not follow the jira update very carefully, I paste it here and
>>> want to let more people comment it and then to see whether its feasible.
>>>
>>> "Seems like we have not found a solution acceptable to everybody. I tries to
>>> propose a new approach. Let's see whether anybody can shed some light on it
>>> and make it as reality.
>>>
>>> 1) We add a basic data structure, called as counter, which is a special type
>>> of super column.
>>>
>>> 2) The name of each column in the counter super column, is the host name of
>>> a cassandra node. And the value is the calculated result from that node.
>>>
>>> 3) WRITE PATH: Once a node receives the add/dec request of a counter, it
>>> de-serializes its local counter super column, and update the column named by
>>> itself atomically. After that, it propagates the updated column value to
>>> other replicas, just like how the mutation of a normal column is propagated
>>> to other replicas. Different consistency levels can be supported as before.
>>>
>>> 4) READ PATH: Depends on the consistency level, contact several replicas,
>>> read back the counter super column as whole, and get the latest counter
>>> value by summing up all columns in the counter. Read-repair logic can work
>>> as before.
>>>
>>> IMHO, the biggest advantages of this approach, is re-using as many
>>> mechanisms already in the code as possible. So it might not so disruptive.
>>> But adding new thrift API is inevitable. "
>>> NB: If it's feasible, I might not be the right man working on it as I have
>>> not touched the internal of cassandra for more than 1 year. I wants to
>>> contribute something to help us get consensus.
>>>
>>> [1]
>>> https://issues.apache.org/jira/browse/CASSANDRA-1502?focusedCommentId=12915103&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12915103
>>>
>>> best regards,
>>> hanzhu
>>>
>>>
>>> On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>>
>>>> you have misunderstood.  if we continue the 1072 approach of writing
>>>> counter data to the clock field, this is necessarily incompatible with
>>>> the right way of writing counter data to the value field.  it's no
>>>> longer simply a matter of reversing 1070.
>>>>
>>>> On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han <schumi....@gmail.com> wrote:
>>>>> Jonathan,
>>>>>
>>>>> This is a personnel email.
>>>>>
>>>>> On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis <jbel...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>> On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han <schumi....@gmail.com> wrote:
>>>>>>> Can we just let the patch committed but mark it as "alpah" or
>>>>>>> "experimental"?
>>>>>>
>>>>>> I explained exactly why that is not a good approach here:
>>>>>> http://www.mail-archive.com/dev@cassandra.apache.org/msg00917.html
>>>>>>
>>>>> Yes, I see. But the clock structure is in truck since Cassandra-1070.  We
>>>>> still need to clean them
>>>>> out,  whatever. We need somebody to be volunteer to take this work.
>>>>> Considering the complexity
>>>>> of Cassandra-1070, the programmer who has the in depth knowledge of this
>>>>> patch is preferable. And it
>>>>> will take some time to do it.
>>>>>
>>>>> Fortunately,  Johan Oskarsson has promised to take it in the comment of
>>>>> Cassandra-1072[1]:
>>>>>
>>>>> "The clock changes would get into trunk quicker if we didn't, avoiding
>>>> the
>>>>> extra overhead of a big patch during reviews, merge with trunk, code
>>>> updates
>>>>> and publication of a new patch.
>>>>> If the concern is that we won't attend to the clocks once this patch is
>>>> in I
>>>>> can promise that we'll look at it straight away. "
>>>>>
>>>>> And if twitter/digg/simplegeo forks their tree of cassandra, this will
>>>> give
>>>>> a big marketing opportunities of other NOSQL system supporters. As you
>>>> know,
>>>>> the competition is quite fierce currently.
>>>>>
>>>>> So, instead of sticking to the embarrassed situation,  why not change to
>>>>> another strategy:
>>>>>
>>>>>> "Fork another experimental tree from 0.7 beta 1 and accept
>>>>>> Cassandra-1072.  At the same time, start the clean up work on this tree.
>>>>>> Once it's finalized , merge them back to 0.7, no matter it's 0.7.1 or
>>>> 0.7.2.
>>>>>>
>>>>>> Hence, these guys from twitter does not need to maintain a huge
>>>>>> out-of-tree patch, while the quality impact of cassandra-1072 is still
>>>>>> limited.
>>>>>
>>>>> I do know the pain of maintaining a large patch out of the official tree.
>>>>> Once it gets in, everybody will feels much better.
>>>>>
>>>>> If you give some opportunities to this patch, Johan or others  can be
>>>> highly
>>>>> motivated because all of the community works together.  It's a
>>>> compromise,
>>>>> but it's worth.
>>>>>
>>>>> [1]
>>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-1072?focusedCommentId=12909234&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12909234
>>>>>
>>>>>
>>>>>>
>>>>>> --
>>>>>> Jonathan Ellis
>>>>>> Project Chair, Apache Cassandra
>>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>>>> http://riptano.com
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of Riptano, the source for professional Cassandra support
>>>> http://riptano.com
>>>>
>>>
>
>

Re: [DISCUSSION] High-volume counters in Cassandra

Reply via email to