[jira] [Comment Edited] (CASSANDRA-6246) EPaxos

Blake Eggleston (JIRA) Sun, 28 Sep 2014 16:16:32 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151256#comment-14151256
 ]


Blake Eggleston edited comment on CASSANDRA-6246 at 9/28/14 11:15 PM:
----------------------------------------------------------------------

bq. In the current implementation, we only keep the last commit per CQL 
partition. We can do the same for this as well.

Yeah I've been thinking about that some more. Just because we could keep a 
bunch of historical data doesn't mean we should. There may be situations where 
we need to keep more than one instance around though, specifically when the 
instance is part of a strongly connected component. Keeping some historical 
data would be useful for helping nodes recover from short failures where they 
miss several instances, but after a point, transmitting all the activity for 
the last hour or two would just be nuts. The other issue with relying on 
historical data for failure recovery is that you can't keep all of it, so you'd 
have dangling pointers on the older instances. 

For longer partitions, and nodes joining the ring, if we transmitted our 
current dependency bookkeeping for the token ranges they're replicating, the 
corresponding instances, and the current values for those instances, that 
should be enough to get going.

bq. I am also reading about epaxos recently and want to know when do you do the 
condition check in your implementation?

It would have to be when the instance is executed.


was (Author: bdeggleston):
bq. In the current implementation, we only keep the last commit per CQL 
partition. We can do the same for this as well.

Yeah I've been thinking about that some more. Just because we could keep a 
bunch of historical data doesn't mean we should. There may be situations where 
we need to keep more than one instance around though, specifically when the 
instance is part of a strongly connected component. Keeping some historical 
data would be useful for helping instances recover from short failures where 
they miss several instances, but after a point, transmitting all the activity 
for the last hour or two would just be nuts. The other issue with relying on 
historical data for failure recovery is that you can't keep all of it, so you'd 
have dangling pointers on the older instances. 

For longer partitions, and nodes joining the ring, if we transmitted our 
current dependency bookkeeping for the token ranges they're replicating, the 
corresponding instances, and the current values for those instances, that 
should be enough to get going.

bq. I am also reading about epaxos recently and want to know when do you do the 
condition check in your implementation?

It would have to be when the instance is executed.

> EPaxos
> ------
>
>                 Key: CASSANDRA-6246
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Blake Eggleston
>            Priority: Minor
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is 
> that Multi-paxos requires leader election and hence, a period of 
> unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
> (2) is particularly useful across multiple datacenters, and (3) allows any 
> node to act as coordinator: 
> http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to 
> implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-6246) EPaxos

Reply via email to