[
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151256#comment-14151256
]
Blake Eggleston edited comment on CASSANDRA-6246 at 9/28/14 11:15 PM:
----------------------------------------------------------------------
bq. In the current implementation, we only keep the last commit per CQL
partition. We can do the same for this as well.
Yeah I've been thinking about that some more. Just because we could keep a
bunch of historical data doesn't mean we should. There may be situations where
we need to keep more than one instance around though, specifically when the
instance is part of a strongly connected component. Keeping some historical
data would be useful for helping nodes recover from short failures where they
miss several instances, but after a point, transmitting all the activity for
the last hour or two would just be nuts. The other issue with relying on
historical data for failure recovery is that you can't keep all of it, so you'd
have dangling pointers on the older instances.
For longer partitions, and nodes joining the ring, if we transmitted our
current dependency bookkeeping for the token ranges they're replicating, the
corresponding instances, and the current values for those instances, that
should be enough to get going.
bq. I am also reading about epaxos recently and want to know when do you do the
condition check in your implementation?
It would have to be when the instance is executed.
was (Author: bdeggleston):
bq. In the current implementation, we only keep the last commit per CQL
partition. We can do the same for this as well.
Yeah I've been thinking about that some more. Just because we could keep a
bunch of historical data doesn't mean we should. There may be situations where
we need to keep more than one instance around though, specifically when the
instance is part of a strongly connected component. Keeping some historical
data would be useful for helping instances recover from short failures where
they miss several instances, but after a point, transmitting all the activity
for the last hour or two would just be nuts. The other issue with relying on
historical data for failure recovery is that you can't keep all of it, so you'd
have dangling pointers on the older instances.
For longer partitions, and nodes joining the ring, if we transmitted our
current dependency bookkeeping for the token ranges they're replicating, the
corresponding instances, and the current values for those instances, that
should be enough to get going.
bq. I am also reading about epaxos recently and want to know when do you do the
condition check in your implementation?
It would have to be when the instance is executed.
> EPaxos
> ------
>
> Key: CASSANDRA-6246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Blake Eggleston
> Priority: Minor
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is
> that Multi-paxos requires leader election and hence, a period of
> unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos,
> (2) is particularly useful across multiple datacenters, and (3) allows any
> node to act as coordinator:
> http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to
> implement it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)