Mikhail Petrov created IGNITE-17731:
---------------------------------------
Summary: Possible LRT in case of postponed GridDhtLockRequest
Key: IGNITE-17731
URL: https://issues.apache.org/jira/browse/IGNITE-17731
Project: Ignite
Issue Type: Bug
Reporter: Mikhail Petrov
Let's assume the foowing scenario:
1. TX coordinator starts transaction and sends GridDhtLockRequest to "near"
nodes.
2. Some GridDhtLockRequest messages was delayed by the network.
3. Not all "near" nodes receive GridDhtLockRequest and as result not all of
them respond to the TX coordinator.
4. TX coordinator aborts TX by the timeout.
5. Completed TX ID is stored in IgniteTxManager#completedVersHashMap.
6. TX load continuous (assume puts in TX cache) and record about described
above completed TX is evicted from the map.
7. GridDhtLockRequest from the clause 2 is finally recived by the "near" nodes.
They lock keys, start the local TX, and respond to the TX coordinator.
But currently TX coordinator ignores GridDhtLockResponce as info about initial
TX was evicted and does nothing.
As a result near nodes keep holding key locks and waiting for next steps of TX
protocol that will never happen as TX was already completed.
As a WA TX can be explicitly KILLED on the near node.
It is proposed to handle this situation and not aquire locks on the near node
if TX coordinator or other cluster nodes do not have notion about TX to which
current lock request belongs to.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)