[
https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-11991:
-----------------------------------------
Status: Patch Available (was: Open)
For context, the problem is basically the one I described in [my
comment|https://issues.apache.org/jira/browse/CASSANDRA-9649?focusedCommentId=14601016&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601016]
on CASSANDRA-9649 and for which I suggested reverting CASSANDRA-7801.
Now, I was kind of wrong about reverting CASSANDRA-7801 since since
CASSANDRA-9649 we were relying on {{ClientState.getTimestamp()}} to give use
timestamp that were unique for the running VM, which meant we can't blindly
revert CASSANDRA-7801.
What I think is the simplest solution however is to stop relying on that
property (of {{ClientState.getTimestamp()}}) for the uniqueness of our ballots,
but instead randomize the non-timestamp parts of the ballot for every new
ballot. With that, we don't have to revert CASSANDRA-7801, we just have to
ensure that if we use the last known proposal timestamp (i.e. if whomever clock
generated that timestamp is "in the future"), we don't persist it in the local
clock (this in turn means the timestamp might not be unique in the VM for 2
concurrent paxos operation and hence the need to randomize the rest of the
UUID).
I've pushed a patch for this for 2.1. I'll attach branches for 2.2+ with tests
tomorrow (but was waiting on the 2.1 results before doing that) but I don't
think the modified code has changed since 2.1 so marking ready for review in
the meantime.
| [2.1|https://github.com/pcmanus/cassandra/commits/11991-2.1] |
[utests|http://cassci.datastax.com/job/pcmanus-11991-2.1-testall/] |
[dtests|http://cassci.datastax.com/job/pcmanus-11991-2.1-dtest/] |
> On clock skew, paxos may "corrupt" the node clock
> -------------------------------------------------
>
> Key: CASSANDRA-11991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11991
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node
> can "corrupt" other node clocks through Paxos. That wasn't intended and we
> should fix that. I'll attach a patch later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)