[
https://issues.apache.org/jira/browse/IGNITE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729634#comment-16729634
]
Vladimir Ozerov commented on IGNITE-9322:
-----------------------------------------
Hi. We conducted review of current design with [~Pavlukhin], [~gvvinblade],
[~agoncharuk], [~amashenkov] and [~agura]. Here is the outcome of this
discussion:
# We decided that deadlock detection should be triggered after new "deadlock
detection timeout" is expired. It will be added to
{{TransactionConfiguration}}. Default value of this timeout will be equal to
"failure detection timeout", which is 10s.
# We will not trigger deadlock detection on normal transaction timeout as it is
done with current deadlock detector. The reason is that our new algorithm do
not know whether timeout is happened until it receives a probe with a cycle. As
this may never happen, we would need to wait for some additional timeout *in
addition* to transaction timeout to be more or less confident that we are no in
a deadlock. This is not convenient. Alternative - to notify probe originator
that probe failed to find a deadlock explicitly. But unfortunately it is not
possible with our algorithm either - probe may travel multi-way path in case of
concurrent lock acquisition (e.g. {{GridNearTxQueryEnlistRequest}}), so we do
not know exactly when probing is completely finished.
# We need to add a counter to locking engine - when the next key is locked,
counter is advanced. Current value of the counter should be added to probe. If
cycle is detected and probe returns to us, we must double-check whether
original lock counter is still the same. If yes - start rollback. Otherwise -
ignore probe. This is needed to avoid rollback of presumably deadlocked
transaction, while in reality it is no longer in a deadlock. One case when it
may happen - concurrent timeout of rollback of another deadlocked transaction.
# When probe travels between nodes, we must collect info about all transactions
we passed through. This is needed to detect cycle when probe originator is not
a part of deadlock. E.g. consider a situation when nodes A, B, C are in a
deadlock. Then node D tries to acquire a key locked by A and eventually
triggers a probe. In this case it will travel (D -> A => B -> C -> A). Info of
all passed transactions is needed to stop the probe once we reach A for the
second time.
# Info about transaction we must collect:
#* {{xid}} - to know which transaction to kill
#* {{near_node_id}} - to know what node to ask for a kill
#* {{lock_counter}} - lock counter of current transaction to avoid
false-positive rollbacks as explained in p.3
#* {{start_time}} - start time of current transaction, to choose a victim as
explained below
# Victim transaction should be chosen based on either "youngest dies" or
"oldest dies" policy. We need to investigate literature to find out what
algorithm is better. In any case,victim will be chosen via comparison of TX
infos. First we compare {{start_time}}. If it is equal, we compare {{xid}} as
it is both comparable and unique inside a cluster. Note that due to a clock
drift this algorithm may choose a transaction which is not actually
youngest/oldest. Instead, this is best effort approach which will work
correctly most of the time. But this approach has another critical property:
given N different transactions (t1, t2 ... tN), every node will choose the same
victim, thus avoiding killing more than one transaction.
# Consider optimization: delay probe start for the given {{(xid,
lock_counter)}} pair if another probe passed through this pair recently. This
way we will decrease number of message when several nodes participating in a
deadlock want to start probing nearly simultaneously. To achieve this we must
save a timestamp of a last observed probe for the given {{(xid,
lock_counter)}}. When counter is advanced, this timestamp is cleared.
Future improvements which we consider out of scope of this ticket:
# Add SQL view and JMX beans to track deadlock detection process: number of
probes sent, number of probes received, number of detected deadlocks, etc.
# Add detailed deadlock info to exception and/or node logs to simplify debug
for users. This may contain transaction labels, locked keys (might be unsafe
from security perspective), locked statements, etc.
# Investigate other victim selection algorithms. E.g. "least/most acquired
locks" or so
# It seems that we can reuse the same mechanism for "classical" caches and
deprecate existing deadlock detector. This is a good candidate for AI 3.0 scope.
> MVCC: implement deadlock detector
> ---------------------------------
>
> Key: IGNITE-9322
> URL: https://issues.apache.org/jira/browse/IGNITE-9322
> Project: Ignite
> Issue Type: Task
> Components: mvcc
> Reporter: Vladimir Ozerov
> Assignee: Ivan Pavlukhin
> Priority: Major
>
> Deadlocks are not uncommon during SQL execution.
> We need to implement distributed deadlock detection protocol for MVCC.
> Essentially, nodes should exchange some map of tx wait lists, and try to find
> a loop. If loop is found, then one of problematic transactions should be
> rolled back.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)