[
https://issues.apache.org/jira/browse/DERBY-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367630#comment-15367630
]
Brett Bergquist commented on DERBY-6879:
----------------------------------------
I will update the patch Brian.
"1" snuck in there via my IDE.
"2" I am working on windows and the change to "derbyTesting.jar.lastcontents"
was done by running the ANT target "refreshjardriftcheck" which was prompted
when it saw the new DeadlockWatchdog.class". So any re-ordering was done by
that target, possibly because of being developed on Windows. I can manually
fix that file so only have the DeadlockWatchdog.class being added.
"3" Is there some document on the "sane" versus "insane" builds? I understand
what they are doing and by looking at the "build.xml" I see how to trigger
each, but I can only find one reference in the "BUILDING.html". There seems
to me to be a lot of "magic" on building/testing that I found as an impediment
to contributing.
"4' the IDE snuck this in when reformatting this section according to my coding
guidelines. Looking through the history of the code when I was comparing the
source from 10.10.2.0 to the trunk I braces being removed for no apparent
reason. On the change between revisions 1364916 and 1364917, I see a brace
being added for a bare "if".
Personally since doing software development since 1984, I have come to the
conclusion that having the braces always has no downside and always has an
upside when code is added to an "if" (it ensures that the code is added to
content of the "if" and you don't have to add anything else). Also according
to the coding standard referenced on
http://wiki.apache.org/db-derby/DerbyContributorChecklist
which links to:
http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-142311.html#449
having the "braces" is according to the coding standards. I will remove the
added braces however.
As far as further review, just a sanity check on the change is good.
The original problem to be solved is that a connection in that is performing a
XA transaction that discovers an error that must be cleaned up is going to have
a lock on a EmbedConnection and will then require a lock on the
XATransactionState and if the same XA transaction times out (because of the XA
transaction timer value) while the original XA transaction is being worked,
then the timeout handling is going to have a lock on the XATransactionState and
then require a lock on the EmbedConnection, triggering the deadlock.
The change in the patch fixes this problem is the smallest way that I know of
by altering the locking on the XATransactionState when invoked via the timeout
by removing the synchronization on the "cancel" method and inline synchronizing
on the "XATransactionState" to while its internals are being altered, then
releasing the lock to allow the EmbedConnection.rollback to be invoked without
the lock and finally acquiring the lock again to finish cleaning up the
"XATransactionState".
> Engine deadlock between XA timeout handling and cleanupOnError
> --------------------------------------------------------------
>
> Key: DERBY-6879
> URL: https://issues.apache.org/jira/browse/DERBY-6879
> Project: Derby
> Issue Type: Bug
> Components: Services
> Affects Versions: 10.10.2.0
> Environment: Solaris 10.5 on Oracle M5000
> Reporter: Brett Bergquist
> Attachments: derby-6879-2016-07-05.diff, derby-6879-test.diff,
> svnstatus.txt
>
>
> Deadlock between XA timer cleanup task and the ContextManager.cleanupOnError
> Found one Java-level deadlock:
> =============================
> "DRDAConnThread_34":
> waiting to lock monitor 0x0000000104b14d18 (object 0xfffffffd9090f058, a
> org.apache.derby.jdbc.XATransactionState),
> which is held by "Timer-0"
> "Timer-0":
> waiting to lock monitor 0x00000001038b96e8 (object 0xfffffffd9090d8b0, a
> org.apache.derby.impl.jdbc.EmbedConnection40),
> which is held by "DRDAConnThread_34"
>
> Java stack information for the threads listed above:
> ===================================================
> "DRDAConnThread_34":
> at org.apache.derby.jdbc.XATransactionState.cleanupOnError(Unknown
> Source)
> - waiting to lock <0xfffffffd9090f058> (a
> org.apache.derby.jdbc.XATransactionState)
> at
> org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown
> Source)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.cleanupOnError(Unknown
> Source)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
> Source)
> - locked <0xfffffffd9090d8b0> (a
> org.apache.derby.impl.jdbc.EmbedConnection40)
> at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown
> Source)
> at org.apache.derby.iapi.jdbc.BrokeredPreparedStatement.execute(Unknown
> Source)
> at org.apache.derby.impl.drda.DRDAStatement.execute(Unknown Source)
> at
> org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTTobjects(Unknown
> Source)
> at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTT(Unknown
> Source)
> at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown
> Source)
> at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> "Timer-0":
> at org.apache.derby.impl.jdbc.EmbedConnection.xa_rollback(Unknown Source)
> - waiting to lock <0xfffffffd9090d8b0> (a
> org.apache.derby.impl.jdbc.EmbedConnection40)
> at org.apache.derby.jdbc.XATransactionState.cancel(Unknown Source)
> - locked <0xfffffffd9090f058> (a
> org.apache.derby.jdbc.XATransactionState)
> at
> org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask.run(Unknown
> Source)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
>
> Found 1 deadlock.
> This deadlock caused Derby to create 18000 transaction recovery logs because
> of the XA transaction that did not cleanup in the timeout. Rebooting the
> system would cause a 50 hour boot up time to process the transaction logs so
> recovery had to be done by going to a backup database before the issue
> occurred.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)