For now, I think the two-phase await is the only option. After the fix is prototyped we need to benchmark and check what is the impact of this change on PME timing.
2018-03-20 18:09 GMT+03:00 Dmitry Pavlov <[email protected]>: > Hi Igniters, > > I prefer option 1 because throwing any exceptions is bad for product > usability. I think we should do this way only if it is unavoidable. > > In the same time it would be good if we could provide so reliable but > optimized (from the point of view of messages count) solution. > > Please share your vision. > > Sincerely, > Dmitriy Pavlov > > пн, 19 мар. 2018 г. в 20:15, Pavel Kovalenko <[email protected]>: > > > Hello Igniters, > > > > Current implementation of > > GridDhtPartitionsExchangeFuture#waitPartitionRelease function doesn't > give > > us 100% guarantees that > > after this method completes there are no ongoing atomic or transactional > > updates on current node during main stage of PME. > > It gives us only guarantee that all primary updates will be finished on > > that node, while we can still receive and process backup updates after > this > > method. > > Example of such case is described in > > https://issues.apache.org/jira/browse/IGNITE-7871 > > > > To avoid such situations we would like to implement second phase of > > waitPartitionRelease method. > > On this phase every server node participating in PME should wait while > all > > other server nodes will finish their ongoing updates. > > > > Here is brief algorithm description: > > > > Non-coordinator node: > > 1) Finish all ongoing atomic & transactional updates. > > 2) Send acknowledgement to coordinator. > > 3) Wait for final acknowledgement from coordinator, that all nodes > finished > > their updates. > > 4) Continue PME. > > > > Coordinator node: > > 1) Finish all ongoing atomic & transactional updates. > > 2) Wait for all acknowledgements from all server nodes. > > 3) Send final acknowledgement to all server nodes. > > 4) Continue PME. > > > > Acknowledgement messages have tiny size, so network pressure and overall > > performance drop will be minimal. > > > > Another solution of the problem is just cancelling atomic backup updates > > and transactional backup updates on PREPARED phase if topology version is > > changed. > > But from user perspective it's not correct to catch transaction errors > even > > in cases when node is joining to the cluster. > > > > Any thoughts? > > >
