[
https://issues.apache.org/jira/browse/IGNITE-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222229#comment-15222229
]
Andrey Gura edited comment on IGNITE-2854 at 4/5/16 11:18 PM:
--------------------------------------------------------------
Algorithm described in previous comment has drawbacks.
It can't detect deadlock for transaction that was timed out and involved into
deadlock or can detect invalid deadlock due to a race conditions.
For example we have transactions {{TX1}} and {{TX2}} with the same timeout and
start time. {{TX1}} holds lock on key {{K1}} and requests lock for {{K2}} while
{{TX2}} hold lock on key {{K2}} and requests lock for {{K1}} so it is deadlcok.
{{K1}} and {{K2}} have different primary nodes so both transactions are
distributed.
When {{TX1}} and {{TX2}} times out all {{GridDhtColocatedLockFuture}} and
blocked {{GridDhtLockFuture}} times out also. {{GridDhtLockFuture.onTimeout}}
initiates deadlock detection while {{GridDhtColocatedLockFuture.onTimeout}}
releases locks and then rollback corresponding transaction. So we have
incomplete information about transactions state and can't detect deadlock or
detect something invalid like {{TX1 <-> TX1}}.
The second problem is that in current implementation remote nodes will not send
response to near node in case of {{GridDhtLockFuture}} timeout. So we can't
print deadlock information in user thread.
Suggested solution:
Deadlock detection initiates by near node in case of
{{GridDhtColocatedNearFuture.onTimeout}} invoked. At the same time all
{{GridDhtLockFuture}}'s register futures in transaction manager. This futures
will be completed when special request about finished detection will be
received from near node.
It is still possible race conditions because for each timed out transaction
will be started concurrent deadlock detection process.
was (Author: agura):
Algorithm described in previous comment has drawbacks.
It can't detect deadlock for transaction that was timed out and involved into
deadlock or can detect invalid deadlock due to a race conditions.
For example we have transactions {{TX1}} and {{TX2}} with the same timeout and
start time. {{TX1}} holds lock on key {{K1}} and requests lock for {{K2}} while
{{TX2}} hold lock on key {{K2}} and requests lock for {{K1}} so it is deadlcok.
{{K1}} and {{K2}} have different primary nodes so both transactions are
distributed.
When {{TX1}} and {{TX2}} times out all {{GridDhtColocatedLockFuture}} and
blocked {{GridDhtLockFuture}} times out also. {{GridDhtLockFuture.onTimeout}}
initiates deadlock detection while {{GridDhtColocatedLockFuture.onTimeout}}
releases locks and then rollback corresponding transaction. So we have
incomplete information about transactions state and can't detect deadlock or
detect something invalid like {{TX1 <-> TX1}}.
The second problem is that in current implementation remote nodes will not send
response to near node in case of {{GridDhtLockFuture}} timeout. So we can't
print deadlock information in user thread.
Suggested solution:
Deadlock detection initiates by near node in case of
{{GridDhtColocatedNearFuture.onTimeout}} invoked. At the same time all
{{GridDhtLockFuture}}s register futures in transaction manager. This futures
will be completed when special request about finished detection will be
received from near node.
It is still possible race conditions because for each timed out transaction
will be started concurrent deadlock detection process.
> Need to implement deadlock detection
> ------------------------------------
>
> Key: IGNITE-2854
> URL: https://issues.apache.org/jira/browse/IGNITE-2854
> Project: Ignite
> Issue Type: New Feature
> Components: cache
> Affects Versions: 1.5.0.final
> Reporter: Valentin Kulichenko
> Assignee: Andrey Gura
> Fix For: 1.6
>
>
> Currently, if transactional deadlock occurred, there is no easy way to find
> out which locks were reordered.
> We need to add a mechanism that will collect information about awating
> candidates, analyze it and show guilty keys. Most likely this should be
> implemented with the help of custom discovery message.
> In addition we should automatically execute this mechanism if transaction
> times out and add information to timeout exception.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)