[ 
https://issues.apache.org/jira/browse/IGNITE-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641358#comment-16641358
 ] 

Ivan Pavlukhin edited comment on IGNITE-5935 at 10/18/18 3:12 PM:
------------------------------------------------------------------

If a node fails before finishing all initiated by it transactions they must be 
removed from active list on mvcc coordinator strictly after local transaction 
completion on each participating node. There are 2 cases handled differently 
depending on node type (client or server).
 # Transactions left by a server node are removed from the active list on PME.
 # Transactions left by a client node are removed from the active list after 
cluster-wide voting when each node gives a vote after making decision on all 
transactions recovery on that node.

Also _partition counters_ should be kept consistent among partition replicas 
after recovery. Current transaction commit protocol delivers _partition 
counters_ to backups on _prepare_ phase. During recovery there could occur a 
situation when transaction is recovering case when primary has failed and one 
backup received counters and another do not. In such case transaction should be 
rolled back and counters should be aligned. As primary has failed PME will 
occur. We must close all possible _gaps_ in counters before PME is complete. 
It's achieved with the following steps:
1. Interchange counters among sibling backups before finishing recovering 
transacitons.
2. Drain pending partition counter queues during PME.


was (Author: pavlukhin):
If a node fails before finishing all initiated by it transactions they must be 
removed from active list on mvcc coordinator strictly after local transaction 
completion on each participating node. There are 2 cases handled differently 
depending on node type (client or server).
 # Transactions left by a server node are removed from the active list on PME.
 # Transactions left by a client node are removed from the active list after 
cluster-wide voting when each node gives a vote after making decision on all 
transactions recovery on that node.

Also _partition counters_ should be kept consistent among partition replicas 
after recovery. Current protocol delivers _partition counters_ to backups on 
_prepare_ phase. During recovery there could occur a situation when transaction 
is recovering case when primary has failed and one backup received counters and 
another do not. Such case is a rollback and counters should be aligned. As 
primary has failed PME will occur. We rely on counters alignment during PME.

> MVCC TX: Tx recovery protocol
> -----------------------------
>
>                 Key: IGNITE-5935
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5935
>             Project: Ignite
>          Issue Type: Task
>          Components: cache, mvcc
>            Reporter: Semen Boikov
>            Assignee: Ivan Pavlukhin
>            Priority: Major
>             Fix For: 2.7
>
>
> Transaction recovery procedure is initiated when near node failed before 
> transaction was finished.
> In MVCC transactions _partition update counter_ modification is started on 
> prepare phase. If a transaction was prepared at least on one node we need to 
> finish _partition update counter_ modification consistently on all 
> participating nodes.
> Also recovered transaction should be removed from active transactions list on 
> mvcc coordinator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to