[ https://issues.apache.org/jira/browse/IGNITE-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641358#comment-16641358 ]
Ivan Pavlukhin edited comment on IGNITE-5935 at 10/18/18 3:12 PM: ------------------------------------------------------------------ If a node fails before finishing all initiated by it transactions they must be removed from active list on mvcc coordinator strictly after local transaction completion on each participating node. There are 2 cases handled differently depending on node type (client or server). # Transactions left by a server node are removed from the active list on PME. # Transactions left by a client node are removed from the active list after cluster-wide voting when each node gives a vote after making decision on all transactions recovery on that node. Also _partition counters_ should be kept consistent among partition replicas after recovery. Current transaction commit protocol delivers _partition counters_ to backups on _prepare_ phase. During recovery there could occur a situation when transaction is recovering case when primary has failed and one backup received counters and another do not. In such case transaction should be rolled back and counters should be aligned. As primary has failed PME will occur. We must close all possible _gaps_ in counters before PME is complete. It's achieved with the following steps: 1. Interchange counters among sibling backups before finishing recovering transacitons. 2. Drain pending partition counter queues during PME. was (Author: pavlukhin): If a node fails before finishing all initiated by it transactions they must be removed from active list on mvcc coordinator strictly after local transaction completion on each participating node. There are 2 cases handled differently depending on node type (client or server). # Transactions left by a server node are removed from the active list on PME. # Transactions left by a client node are removed from the active list after cluster-wide voting when each node gives a vote after making decision on all transactions recovery on that node. Also _partition counters_ should be kept consistent among partition replicas after recovery. Current protocol delivers _partition counters_ to backups on _prepare_ phase. During recovery there could occur a situation when transaction is recovering case when primary has failed and one backup received counters and another do not. Such case is a rollback and counters should be aligned. As primary has failed PME will occur. We rely on counters alignment during PME. > MVCC TX: Tx recovery protocol > ----------------------------- > > Key: IGNITE-5935 > URL: https://issues.apache.org/jira/browse/IGNITE-5935 > Project: Ignite > Issue Type: Task > Components: cache, mvcc > Reporter: Semen Boikov > Assignee: Ivan Pavlukhin > Priority: Major > Fix For: 2.7 > > > Transaction recovery procedure is initiated when near node failed before > transaction was finished. > In MVCC transactions _partition update counter_ modification is started on > prepare phase. If a transaction was prepared at least on one node we need to > finish _partition update counter_ modification consistently on all > participating nodes. > Also recovered transaction should be removed from active transactions list on > mvcc coordinator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)