Mike Matrigali wrote:
I have seen this fail 2 or 3 times this week with various deadlocks, my
assumption is that the problem is a test problem and that it needs to
be changed to handle deadlocks.
I haven't seen requests for extra info from new cases, so my assumption
is that no new value is being gained by others running into this know
JIRA issue.
Is it time to move this out of the suite until a fix is submitted?
I am sorry that it has taken me some time to report back on this. I
have been much on the road lately (for different reasons), and I am
trying to catch up now. What I have done with this issue is that I have
run the test with some tracing to see what caused the lock timeouts. I
have not quite got to the bottom of it, but so far it seems to me that
it is not a deadlock scenario, but just timeouts due to long queues on
the dictionary lock. (See below for more info).
Since creating 100 tables in parallel is not a common scenario, I am not
sure whether it is worth the effort to attempt fix this so the test runs
cleanly. I was about to suggest take we should just remove the test
from derbyall. The test was made to test a fix (Derby-230) that I do
not think is very likely to reoccur. Unless someone protests, this is
what I will do.
A more detailed description of what I have found:
When a thread tries to create a table, it will first get a shared lock
on the dictionary (DataDictionaryImpl.startReading). This is released
before it tries to lock the dictionary exclusively. The way
DataDictionaryImpl.startwriting works is that it first checks whether
someone is holding a lock on the dictionary. If so, it will sleep for a
while a then try again. This goes on for a while until it gets
impatient and actually requests an exclusive lock and enters the lock
queue. In the mean time, a lot more threads have acquired a shared lock
and the updating thread will have to wait for all of them to release it.
This causes the thread to time out. I have not tried whether it
would improve the issue if we did not allow readers to acquire locks
while a writer is waiting, and I do not know what general consequences
that may have. However, since this does not seem to create problems for
normal load, I doubt that it is worth the effort to do anything about it.
--
Øystein