[
https://issues.apache.org/jira/browse/DERBY-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Knut Anders Hatlen updated DERBY-3493:
--------------------------------------
Attachment: d3493-1a.diff
Attaching a patch which I believe solves the hang.
The patch basically makes ConcurrentCache.create() use ConcurrentHashMap.get()
directly instead of going through ConcurrentCache.getEntry(), which will block
until the identity has been set. Then create() fails immediately if the object
already exists in the cache. Since this introduced yet another difference
between find() and create() in findOrCreateObject(), I also followed Øystein's
suggestion from his review of DERBY-2911 and split findOrCreateObject() into a
number of smaller methods, which I think makes the code easier to follow.
I have started the full regression suite (which seems to run fine) and will
also have stress.multi running in a loop for some time to verify that the hang
really has been fixed.
The hang seems to have been caused by the two table descriptor caches in
DataDictionaryImpl (nameTdCache and OIDTdCache) trying to keep each other
consistent. So when you insert an object into one of these caches, their
setIdentity() methods try to automatically insert it into the other one as
well. So what happened was that one thread inserted an object into one of the
caches, and at the same time another thread inserted an object with the same
identity into the other cache. Both of the caches tried to update the same
object in the other cache at the same time and thereby they ended up waiting
for each other to finish. Since creating an object that already exists should
fail, there's no reason to wait for a not fully initialized object to become
fully initialized before failing. By failing as soon as such a situation is
detected, the two threads don't wait for each other to finish, and the deadlock
is avoided.
> stress.multi times out waiting on testers with blocked testers waiting on the
> same statement
> --------------------------------------------------------------------------------------------
>
> Key: DERBY-3493
> URL: https://issues.apache.org/jira/browse/DERBY-3493
> Project: Derby
> Issue Type: Bug
> Components: Regression Test Failure, SQL, Test
> Affects Versions: 10.4.0.0
> Environment: IBM 1.5 Linux
> Reporter: Kathey Marsden
> Assignee: Knut Anders Hatlen
> Attachments: d3493-1a.diff, threaddump-1204806990660.tdump
>
>
> The diff is:
> 7 del
> < ...running last checks via final.sql
> 7 add
> > ...timed out trying to kill all testers,
> > skipping last scripts (if any). NOTE: the
> > likely cause of the problem killing testers is
> > probably not enough VM memory OR test cases that
> > run for very long periods of time (so testers do not
> > have a chance to notice stop() requests
> Test Failed.
> The testers that are stuck are stuck on the same statement e.g.
> --
> update main2 set y = 'zzz' where x = 5;
> ERROR 08000: Connection closed by unknown interrupt.
> ERROR XJ001: Java exception: ': java.lang.InterruptedException'.
> The interupt exception shows:
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:199)
> at
> org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:195)
> at
> org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:88)
> at
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConn
> ctionContext.java:768)
> at
> org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:606)
> at
> org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:555)
> at org.apache.derby.impl.tools.ij.ij.executeImmediate(ij.java:329)
> at
> org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:508)
> at
> org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:350)
> The code at line 195 of GenericStatement shows:
> ....
> try {
> preparedStmt.wait();
> } catch (InterruptedException ie) {
> throw StandardException.interrupt(ie);
> }
> My first guess is that this is perhaps some type of Statement cache
> concurrency bug, but perhaps
> I am reading it wrong.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.