[
https://issues.apache.org/jira/browse/DERBY-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005128#comment-13005128
]
Knut Anders Hatlen commented on DERBY-5073:
-------------------------------------------
The code is in Deadlock.handle():
// See if the checker is in the deadlock and we
// already picked as a victim
if ((checker.equals(space)) && (deadlockWake ==
Constants.WAITING_LOCK_DEADLOCK)) {
victim = checker;
break;
}
It never kicks in, and instead it goes further down in the method and wakes
another victim:
ActiveLock victimLock = (ActiveLock) waiters.get(victim);
victimLock.wakeUp(Constants.WAITING_LOCK_DEADLOCK);
The new victim wakes up from it's waiting state in
ActiveLock.waitForGrant()/ConcurrentLockSet.lockObject(), calls checkDeadlock()
and ends up Deadlock.handle() again.
I think the problem may be caused by the following piece of code in
Deadlock.look():
} else {
// simply waiting on another waiter
space =
waitingLock.getCompatabilitySpace();
}
As far as I can see, this code doesn't make any sense. space will already have
the same value as waitingLock.getCompatabilitySpace(), so the operation is
actually a no-op. (waitingLock is obtained by calling waiters.get(space), and
the waiters Map is built up by (waitingLock.getCompatabilitySpace(),
waitingLock) value pairs, see LockControl.addWaiters().) Furthermore, this
leads to "space" being considered twice in a row by the deadlock detection, so
that it thinks that the transaction owning that compatibility space is waiting
for one of its own locks. It therefore detects the deadlock prematurely, and
before it has seen all transactions involved in it, and incorrectly concludes
that the original victim wasn't involved.
By changing that last piece of code from a no-op to actually moving one step
ahead in the wait graph, the repro does fail with a deadlock error. That is,
change the assignment to:
space = ((ActiveLock) waitOn).getCompatabilitySpace();
I tried running the regression tests with that change, and they all passed. I
do find the deadlock detection code a bit hard to follow, so I'm not totally
convinced this is the right change.
> Derby deadlocks without recourse on simultaneous correlated subqueries
> ----------------------------------------------------------------------
>
> Key: DERBY-5073
> URL: https://issues.apache.org/jira/browse/DERBY-5073
> Project: Derby
> Issue Type: Bug
> Components: Services
> Affects Versions: 10.0.2.1, 10.1.2.1, 10.2.2.0, 10.3.3.0, 10.4.2.0,
> 10.5.3.0, 10.6.2.1, 10.7.1.1, 10.8.0.0
> Reporter: Karl Wright
> Attachments: Derby5073.java
>
>
> When the following two queries are run against tables that contain the
> necessary fields, using multiple threads, Derby deadlocks and none of the
> queries ever returns. Derby apparently detects no deadlock condition, either.
> SELECT t0.* FROM jobqueue t0 WHERE EXISTS(SELECT 'x' FROM carrydown t1 WHERE
> t1.parentidhash IN (?) AND t1.childidhash=t0.dochash AND t0.jobid=t1.jobid)
> AND t0.jobid=?
> SELECT t0.* FROM jobqueue t0 WHERE EXISTS(SELECT 'x' FROM carrydown t1 WHERE
> t1.parentidhash IN (?) AND t1.childidhash=t0.dochash AND t0.jobid=t1.jobid
> AND t1.newField=?) AND t0.jobid=?
> This code comes from Apache ManifoldCF, and has occurred when there are five
> or more threads trying to execute these two queries at the same time.
> Originally we found this on 10.5.3.0. It was hoped that 10.7.1.1 would fix
> the problem, but it hasn't.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira