[ 
https://issues.apache.org/jira/browse/IGNITE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729634#comment-16729634
 ] 

Vladimir Ozerov edited comment on IGNITE-9322 at 12/27/18 2:16 PM:
-------------------------------------------------------------------

Hi. We conducted review of current design with [~Pavlukhin], [~gvvinblade], 
[~agoncharuk], [~amashenkov] and [~agura]. Here is the outcome of this 
discussion:
# We decided that deadlock detection should be triggered after new "deadlock 
detection timeout" is expired. It will be added to 
{{TransactionConfiguration}}. Default value of this timeout will be equal to 
"failure detection timeout", which is 10s.
# We will not trigger deadlock detection on normal transaction timeout as it is 
done with current deadlock detector. The reason is that our new algorithm do 
not know whether deadlock has happened until it receives a probe with a cycle. 
As this may never happen, we would need to wait for some additional timeout *in 
addition* to transaction timeout to be more or less confident that we are not 
in a deadlock. This is not convenient. Alternative - to notify probe originator 
that probe failed to find a deadlock explicitly. But unfortunately it is not 
possible with our algorithm either - probe may travel multi-way path in case of 
concurrent lock acquisition (e.g. {{GridNearTxQueryEnlistRequest}}), so we do 
not know exactly when probing is completely finished.
# We need to add a counter to locking engine - when the next key is locked, 
counter is advanced. Current value of the counter should be added to probe. If 
cycle is detected and probe returns to us, we must double-check whether 
original lock counter is still the same. If yes - start rollback. Otherwise - 
ignore probe. This is needed to avoid rollback of presumably deadlocked 
transaction, while in reality it is no longer in a deadlock. One case when it 
may happen - concurrent timeout of rollback of another deadlocked transaction.
# When probe travels between nodes, we must collect info about all transactions 
we passed through. This is needed to detect cycle when probe originator is not 
a part of deadlock. E.g. consider a situation when nodes A, B, C are in a 
deadlock. Then node D tries to acquire a key locked by A and eventually 
triggers a probe. In this case it will travel (D -> A => B -> C -> A). Info of 
all passed transactions is needed to stop the probe once we reach A for the 
second time.
# Info about transaction we must collect: 
#* {{xid}} - to know which transaction to kill
#* {{near_node_id}} - to know what node to ask for a kill
#* {{lock_counter}} - lock counter of current transaction to avoid 
false-positive rollbacks as explained in p.3
#* {{start_time}} - start time of current transaction, to choose a victim as 
explained below
# Victim transaction should be chosen based on either "youngest dies" or 
"oldest dies" policy. We need to investigate literature to find out what 
algorithm is better. In any case,victim will be chosen via comparison of TX 
infos. First we compare {{start_time}}. If it is equal, we compare {{xid}} as 
it is both comparable and unique inside a cluster. Note that due to a clock 
drift this algorithm may choose a transaction which is not actually 
youngest/oldest. Instead, this is best effort approach which will work 
correctly most of the time. But this approach has another critical property: 
given N different transactions (t1, t2 ... tN), every node will choose the same 
victim, thus avoiding killing more than one transaction.
# Consider optimization: delay probe start for the given {{(xid, 
lock_counter)}} pair if another probe passed through this pair recently. This 
way we will decrease number of message when several nodes participating in a 
deadlock want to start probing nearly simultaneously. To achieve this we must 
save a timestamp of a last observed probe for the given {{(xid, 
lock_counter)}}. When counter is advanced, this timestamp is cleared.

Future improvements which we consider out of scope of this ticket:
# Add SQL view and JMX beans to track deadlock detection process: number of 
probes sent, number of probes received, number of detected deadlocks, etc.
# Add detailed deadlock info to exception and/or node logs to simplify debug 
for users. This may contain transaction labels, locked keys (might be unsafe 
from security perspective), locked statements, etc.
# Investigate other victim selection algorithms. E.g. "least/most acquired 
locks" or so
# It seems that we can reuse the same mechanism for "classical" caches and 
deprecate existing deadlock detector. This is a good candidate for AI 3.0 scope.


was (Author: vozerov):
Hi. We conducted review of current design with [~Pavlukhin], [~gvvinblade], 
[~agoncharuk], [~amashenkov] and [~agura]. Here is the outcome of this 
discussion:
# We decided that deadlock detection should be triggered after new "deadlock 
detection timeout" is expired. It will be added to 
{{TransactionConfiguration}}. Default value of this timeout will be equal to 
"failure detection timeout", which is 10s.
# We will not trigger deadlock detection on normal transaction timeout as it is 
done with current deadlock detector. The reason is that our new algorithm do 
not know whether deadlock has happened until it receives a probe with a cycle. 
As this may never happen, we would need to wait for some additional timeout *in 
addition* to transaction timeout to be more or less confident that we are no in 
a deadlock. This is not convenient. Alternative - to notify probe originator 
that probe failed to find a deadlock explicitly. But unfortunately it is not 
possible with our algorithm either - probe may travel multi-way path in case of 
concurrent lock acquisition (e.g. {{GridNearTxQueryEnlistRequest}}), so we do 
not know exactly when probing is completely finished.
# We need to add a counter to locking engine - when the next key is locked, 
counter is advanced. Current value of the counter should be added to probe. If 
cycle is detected and probe returns to us, we must double-check whether 
original lock counter is still the same. If yes - start rollback. Otherwise - 
ignore probe. This is needed to avoid rollback of presumably deadlocked 
transaction, while in reality it is no longer in a deadlock. One case when it 
may happen - concurrent timeout of rollback of another deadlocked transaction.
# When probe travels between nodes, we must collect info about all transactions 
we passed through. This is needed to detect cycle when probe originator is not 
a part of deadlock. E.g. consider a situation when nodes A, B, C are in a 
deadlock. Then node D tries to acquire a key locked by A and eventually 
triggers a probe. In this case it will travel (D -> A => B -> C -> A). Info of 
all passed transactions is needed to stop the probe once we reach A for the 
second time.
# Info about transaction we must collect: 
#* {{xid}} - to know which transaction to kill
#* {{near_node_id}} - to know what node to ask for a kill
#* {{lock_counter}} - lock counter of current transaction to avoid 
false-positive rollbacks as explained in p.3
#* {{start_time}} - start time of current transaction, to choose a victim as 
explained below
# Victim transaction should be chosen based on either "youngest dies" or 
"oldest dies" policy. We need to investigate literature to find out what 
algorithm is better. In any case,victim will be chosen via comparison of TX 
infos. First we compare {{start_time}}. If it is equal, we compare {{xid}} as 
it is both comparable and unique inside a cluster. Note that due to a clock 
drift this algorithm may choose a transaction which is not actually 
youngest/oldest. Instead, this is best effort approach which will work 
correctly most of the time. But this approach has another critical property: 
given N different transactions (t1, t2 ... tN), every node will choose the same 
victim, thus avoiding killing more than one transaction.
# Consider optimization: delay probe start for the given {{(xid, 
lock_counter)}} pair if another probe passed through this pair recently. This 
way we will decrease number of message when several nodes participating in a 
deadlock want to start probing nearly simultaneously. To achieve this we must 
save a timestamp of a last observed probe for the given {{(xid, 
lock_counter)}}. When counter is advanced, this timestamp is cleared.

Future improvements which we consider out of scope of this ticket:
# Add SQL view and JMX beans to track deadlock detection process: number of 
probes sent, number of probes received, number of detected deadlocks, etc.
# Add detailed deadlock info to exception and/or node logs to simplify debug 
for users. This may contain transaction labels, locked keys (might be unsafe 
from security perspective), locked statements, etc.
# Investigate other victim selection algorithms. E.g. "least/most acquired 
locks" or so
# It seems that we can reuse the same mechanism for "classical" caches and 
deprecate existing deadlock detector. This is a good candidate for AI 3.0 scope.

> MVCC: implement deadlock detector
> ---------------------------------
>
>                 Key: IGNITE-9322
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9322
>             Project: Ignite
>          Issue Type: Task
>          Components: mvcc
>            Reporter: Vladimir Ozerov
>            Assignee: Ivan Pavlukhin
>            Priority: Major
>
> Deadlocks are not uncommon during SQL execution.
> We need to implement distributed deadlock detection protocol for MVCC. 
> Essentially, nodes should exchange some map of tx wait lists, and try to find 
> a loop. If loop is found, then one of problematic transactions should be 
> rolled back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to