[jira] [Commented] (DERBY-6879) Engine deadlock between XA timeout handling and cleanupOnError

Brett Bergquist (JIRA) Fri, 08 Jul 2016 05:58:32 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367630#comment-15367630
 ]


Brett Bergquist commented on DERBY-6879:
----------------------------------------

I will update the patch Brian.

"1" snuck in there via my IDE.

"2" I am working on windows and the change to "derbyTesting.jar.lastcontents" 
was done by running the ANT target "refreshjardriftcheck" which was prompted 
when it saw the new DeadlockWatchdog.class".   So any re-ordering was done by 
that target, possibly because of being developed on Windows.   I can manually 
fix that file so only have the DeadlockWatchdog.class being added.

"3" Is there some document on the "sane" versus "insane" builds?   I understand 
what they are doing and by looking at the "build.xml" I see how to trigger 
each, but I can only find one reference in the "BUILDING.html".   There seems 
to me to be a lot of "magic" on building/testing that I found as an impediment 
to contributing.

"4' the IDE snuck this in when reformatting this section according to my coding 
guidelines. Looking through the history of the code when I was comparing the 
source from 10.10.2.0 to the trunk I braces being removed for no apparent 
reason.   On the change between revisions 1364916 and 1364917, I see a brace 
being added for a bare "if".   

Personally since doing software development since 1984, I have come to the 
conclusion that having the braces always has no downside and always has an 
upside when code is added to an "if" (it ensures that the code is added to 
content of the "if" and you don't have to add anything else).  Also according 
to the coding standard referenced on

http://wiki.apache.org/db-derby/DerbyContributorChecklist

which links to:

http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-142311.html#449

having the "braces" is according to the coding standards.  I will remove the 
added braces however.   

As far as further review, just a sanity check on the change is good.  

The original problem to be solved is that a connection in that is performing a 
XA transaction that discovers an error that must be cleaned up is going to have 
a lock on a EmbedConnection and will then require a lock on the 
XATransactionState and if the same XA transaction times out (because of the XA 
transaction timer value) while the original XA transaction is being worked, 
then the timeout handling is going to have a lock on the XATransactionState and 
then require a lock on the EmbedConnection, triggering the deadlock.

The change in the patch fixes this problem is the smallest way that I know of 
by altering the locking on the XATransactionState when invoked via the timeout 
by removing the synchronization on the "cancel" method and inline synchronizing 
on the "XATransactionState" to while its internals are being altered, then 
releasing the lock to allow the EmbedConnection.rollback to be invoked without 
the lock and finally acquiring the lock again to finish cleaning up the 
"XATransactionState".


> Engine deadlock between XA timeout handling and cleanupOnError
> --------------------------------------------------------------
>
>                 Key: DERBY-6879
>                 URL: https://issues.apache.org/jira/browse/DERBY-6879
>             Project: Derby
>          Issue Type: Bug
>          Components: Services
>    Affects Versions: 10.10.2.0
>         Environment: Solaris 10.5 on Oracle M5000 
>            Reporter: Brett Bergquist
>         Attachments: derby-6879-2016-07-05.diff, derby-6879-test.diff, 
> svnstatus.txt
>
>
> Deadlock between XA timer cleanup task and the ContextManager.cleanupOnError
> Found one Java-level deadlock:
> =============================
> "DRDAConnThread_34":
>   waiting to lock monitor 0x0000000104b14d18 (object 0xfffffffd9090f058, a 
> org.apache.derby.jdbc.XATransactionState),
>   which is held by "Timer-0"
> "Timer-0":
>   waiting to lock monitor 0x00000001038b96e8 (object 0xfffffffd9090d8b0, a 
> org.apache.derby.impl.jdbc.EmbedConnection40),
>   which is held by "DRDAConnThread_34"
>  
> Java stack information for the threads listed above:
> ===================================================
> "DRDAConnThread_34":
>      at org.apache.derby.jdbc.XATransactionState.cleanupOnError(Unknown 
> Source)
>      - waiting to lock <0xfffffffd9090f058> (a 
> org.apache.derby.jdbc.XATransactionState)
>      at 
> org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown 
> Source)
>      at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.cleanupOnError(Unknown 
> Source)
>      at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>      at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>      at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>      - locked <0xfffffffd9090d8b0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection40)
>      at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown 
> Source)
>      at org.apache.derby.iapi.jdbc.BrokeredPreparedStatement.execute(Unknown 
> Source)
>      at org.apache.derby.impl.drda.DRDAStatement.execute(Unknown Source)
>      at 
> org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTTobjects(Unknown 
> Source)
>      at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTT(Unknown 
> Source)
>      at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown 
> Source)
>      at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> "Timer-0":
>      at org.apache.derby.impl.jdbc.EmbedConnection.xa_rollback(Unknown Source)
>      - waiting to lock <0xfffffffd9090d8b0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection40)
>      at org.apache.derby.jdbc.XATransactionState.cancel(Unknown Source)
>      - locked <0xfffffffd9090f058> (a 
> org.apache.derby.jdbc.XATransactionState)
>      at 
> org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask.run(Unknown 
> Source)
>      at java.util.TimerThread.mainLoop(Timer.java:555)
>      at java.util.TimerThread.run(Timer.java:505)
>  
> Found 1 deadlock.
> This deadlock caused Derby to create 18000 transaction recovery logs because 
> of the XA transaction that did not cleanup in the timeout.  Rebooting the 
> system would cause a 50 hour boot up time to process the transaction logs so 
> recovery had to be done by going to a backup database before the issue 
> occurred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DERBY-6879) Engine deadlock between XA timeout handling and cleanupOnError

Reply via email to