For now, I think the two-phase await is the only option. After the fix is
prototyped we need to benchmark and check what is the impact of this change
on PME timing.

2018-03-20 18:09 GMT+03:00 Dmitry Pavlov <[email protected]>:

> Hi Igniters,
>
> I prefer option 1 because throwing any exceptions is bad for product
> usability. I think we should do this way only if it is unavoidable.
>
> In the same time it would be good if we could provide so reliable but
> optimized (from the point of view of messages count) solution.
>
> Please share your vision.
>
> Sincerely,
> Dmitriy Pavlov
>
> пн, 19 мар. 2018 г. в 20:15, Pavel Kovalenko <[email protected]>:
>
> > Hello Igniters,
> >
> > Current implementation of
> > GridDhtPartitionsExchangeFuture#waitPartitionRelease function doesn't
> give
> > us 100% guarantees that
> > after this method completes there are no ongoing atomic or transactional
> > updates on current node during main stage of PME.
> > It gives us only guarantee that all primary updates will be finished on
> > that node, while we can still receive and process backup updates after
> this
> > method.
> > Example of such case is described in
> > https://issues.apache.org/jira/browse/IGNITE-7871
> >
> > To avoid such situations we would like to implement second phase of
> > waitPartitionRelease method.
> > On this phase every server node participating in PME should wait while
> all
> > other server nodes will finish their ongoing updates.
> >
> > Here is brief algorithm description:
> >
> > Non-coordinator node:
> > 1) Finish all ongoing atomic & transactional updates.
> > 2) Send acknowledgement to coordinator.
> > 3) Wait for final acknowledgement from coordinator, that all nodes
> finished
> > their updates.
> > 4) Continue PME.
> >
> > Coordinator node:
> > 1) Finish all ongoing atomic & transactional updates.
> > 2) Wait for all acknowledgements from all server nodes.
> > 3) Send final acknowledgement to all server nodes.
> > 4) Continue PME.
> >
> > Acknowledgement messages have tiny size, so network pressure and overall
> > performance drop will be minimal.
> >
> > Another solution of the problem is just cancelling atomic backup updates
> > and transactional backup updates on PREPARED phase if topology version is
> > changed.
> > But from user perspective it's not correct to catch transaction errors
> even
> > in cases when node is joining to the cluster.
> >
> > Any thoughts?
> >
>

Reply via email to