[
https://issues.apache.org/jira/browse/CASSANDRA-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479167#comment-17479167
]
Branimir Lambov edited comment on CASSANDRA-17164 at 1/20/22, 8:45 AM:
-----------------------------------------------------------------------
{quote}I think this is actually the typical way of implementing Classic Paxos,
even though Lamport's paper seems to suggest you must only contact the nodes
that responded to the prepare (there may be something else specific about his
formulation that necessitates this, I forget, as I dislike his writings on the
topic). This is corroborated by Heidi Howard's dissertation, which was the
easiest place I could find a straight-forward formulation of Classic Paxos
besides that of Lamport. See Algorithm 3 on Page 30.
{quote}
This is an excellent pointer, thank you. Lamport's proof also works with
separate proposal quorum – my only request here is that this should be stated
somewhere as other contributors might start with Lamport's original definition
too.
{quote}I don't quite follow what you mean by this, as this is not limited to
"most recent commit", but a ballot directly maps to the instance id of classic
paxos, it just avoids pre-splitting the range of integers.
{quote}
Well, I don't follow this one. A ballot id is something one replica selects and
sends it as a prepare message to everyone. At this point some nodes may have
committed a proposal, others may have accepted it, and yet others may have
never heard of it. From the point of view of a sequence of paxos instances, the
first two will definitely make promises for different instances, and likely the
third one will be promising for yet another one. The fact that we tie accepting
a promise to a majority with matching "most recent commit" means that we
effectively treat it as the identifier of the current session.
More interestingly, we can then send a proposal to nodes with older most recent
commit (i.e. nodes that are effectively working on a previous paxos instance),
that proposal will be accepted (overwriting the decided value but not advancing
the most recent commit), and this might lead to a decision using just a small
number of participants with the right "most recent commit". Specifically, what
stops the following from happening?
{code:java}
A mrc A accepted A promised B mrc B
accepted B promised C mrc C accepted C promised
prepare(1) -> ABC - - 1 -
- 1 - - 1
propose(1, X) -> ABC - 1,X 1 -
1,X 1 - 1,X 1
commit(1, X) -> AB 1,X - 1 1,X
- 1 - - 1
prepare(2) -> AB 1,X - 2 1,X
- 2 - 1,X 1
majority AB, no refresh
propose(2, Y) -> BC 1,X - 2 1,X
2,Y 2 - 2,Y 2
returns successful Y
prepare(3) -> AC 1,X 1,X 3 1,X
2,Y 2 1,X - 3
refresh stale, then forms majority AC
propose(3, Z) -> ABC 1,X 3,Z 3 1,X
3,Z 3 1,X 3,Z 3
returns successful Z
{code}
If this is permitted, there's a further error possibility if B is partitioned
after the 2,Y proposal succeeds and a commit is executed in B. Then the
committed value can be read from B, and later wiped with a conflicting decision.
In LWT terms, if
X: {{{}column = X if not exists{}}},
Y: {{column = Y if column == X}} and
Z: {{{}column = Z if column == X{}}},
we may be able to read (serially) both Y and Z.
{quote}Could you explain what you are referring to here? I think this is all
standard stuff for Paxos, we're again just recording the most recently used
instance number for each register.
{quote}
Judging from the above, this may be insufficient. I'm likely missing something
here – that something needs to be documented.
{quote}The final commit phase is only required to ensure any "decree"
(decision) is disseminated. If we have proposed that no decree be made, there
is nothing to disseminate, and nothing to complete if another transaction
encounters it. This is in some ways an artefact of the feature of Cassandra's
implementation, that we initiate a paxos round without knowing if it will do
anything, though this feature would I suppose be present for read-only
operations anyway.
{quote}
This is also hard to read from the code: {{Paxos.cas}} does not issue a commit
for an empty proposal, but it seems the next {{prepare}} will return
{{FOUND_INCOMPLETE_ACCEPTED}} and trigger reproposal and commit. This will work
correctly, but perform one roundtrip more than simply committing when the
agreement was reached. What am I missing?
was (Author: blambov):
bq. I think this is actually the typical way of implementing Classic Paxos,
even though Lamport's paper seems to suggest you must only contact the nodes
that responded to the prepare (there may be something else specific about his
formulation that necessitates this, I forget, as I dislike his writings on the
topic). This is corroborated by Heidi Howard's dissertation, which was the
easiest place I could find a straight-forward formulation of Classic Paxos
besides that of Lamport. See Algorithm 3 on Page 30.
This is an excellent pointer, thank you. Lamport's proof also works with
separate proposal quorum -- my only request here is that this should be stated
somewhere as other contributors might start with Lamport's original definition
too.
bq. I don't quite follow what you mean by this, as this is not limited to "most
recent commit", but a ballot directly maps to the instance id of classic paxos,
it just avoids pre-splitting the range of integers.
Well, I don't follow this one. A ballot id is something one replica selects and
sends it as a prepare message to everyone. At this point some nodes may have
committed a proposal, others may have accepted it, and yet others may have
never heard of it. From the point of view of a sequence of paxos instances, the
first two will definitely make promises for different instances, and likely the
third one will be promising for yet another one. The fact that we tie accepting
a promise to a majority with matching "most recent commit" means that we
effectively treat it as the identifier of the current session.
More interestingly, we can then send a proposal to nodes with older most recent
commit (i.e. nodes that are effectively working on a previous paxos instance),
that proposal will be accepted (overwriting the decided value but not advancing
the most recent commit), and this might lead to a decision using just a small
number of participants with the right "most recent commit". Specifically, what
stops the following from happening?
{code}
A mrc A accepted A promised B mrc B
accepted B promised C mrc C accepted C promised
prepare(1) -> ABC - - 1 -
- 1 - - 1
propose(1, X) -> ABC - 1,X 1 -
1,X 1 - 1,X 1
commit(1, X) -> AB 1,X - 1 1,X
- 1 - - 1
prepare(2) -> AB 1,X - 2 1,X
- 2 - 1,X 1
majority AB, no refresh
propose(2, Y) -> BC 1,X - 2 1,X
2,Y 2 - 2,Y 2
returns successful Y
prepare(3) -> AC 1,X 1,X 3 1,X
2,Y 2 1,X - 3
refresh stale, then forms majority AC
propose(3, Z) -> ABC 1,X 3,Z 3 1,X
3,Z 3 1,X 3,Z 3
returns successful Z
{code}
If this is permitted, there's a further error possibility if B is partitioned
after the 2,Y proposal succeeds and a commit is executed in B. Then the
committed value can be read from B, and later wiped with a conflicting
decision.
In LWT terms, if
X: {{column = X if not exists}},
Y: {{column = Y if column == X}} and
Z: {{column = Z if column == X}},
we may be able to read both Y and Z.
bq. Could you explain what you are referring to here? I think this is all
standard stuff for Paxos, we're again just recording the most recently used
instance number for each register.
Judging from the above, this may be insufficient. I'm likely missing something
here -- that something needs to be documented.
bq. The final commit phase is only required to ensure any "decree" (decision)
is disseminated. If we have proposed that no decree be made, there is nothing
to disseminate, and nothing to complete if another transaction encounters it.
This is in some ways an artefact of the feature of Cassandra's implementation,
that we initiate a paxos round without knowing if it will do anything, though
this feature would I suppose be present for read-only operations anyway.
This is also hard to read from the code: {{Paxos.cas}} does not issue a commit
for an empty proposal, but it seems the next {{prepare}} will return
{{FOUND_INCOMPLETE_ACCEPTED}} and trigger reproposal and commit. This will work
correctly, but perform one roundtrip more than simply committing when the
agreement was reached. What am I missing?
> CEP-14: Paxos Improvements
> --------------------------
>
> Key: CASSANDRA-17164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17164
> Project: Cassandra
> Issue Type: Improvement
> Components: Consistency/Coordination, Consistency/Repair
> Reporter: Benedict Elliott Smith
> Assignee: Benedict Elliott Smith
> Priority: Normal
> Fix For: 4.1
>
>
> This ticket encompasses work for [CEP-14|
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]