[
https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175192#comment-13175192
]
Brett Bergquist commented on DERBY-5552:
----------------------------------------
I guess I am confused as well Kathey as I had the debugger attached and do see
it going through the XA code in Derby on the client side. The application
server is setup with the ClientXADataSource and I do see it calling xa.commit
and xa.end for example. The ClientXADataSource is required otherwise the
error:
Local transaction already has 1 non-XA Resource: cannot add more
resources.
occurs. So although there is one database (Derby), it is using XA. The
database is being accessed through EJB's and through Eclipselink and also
through a custom JCA interface driving Message Driven Beans.
For the test case, I had to limit things to get my sanity. So I stopped as
much access to the database as I could but still trigger the problem.
Eventually I got down to one thread of control being processed by EJB's which
do start new transactions. Even with this one access going on, I hit the
lockup issue that I posted. That is when I found the issue that I mention. So
whether or not this is the real issue, I don't know but when I tried to get as
simple a condition as possible, I ran into this.
Thinking now, I don't understand why this would not be hit in a normal case of
a lock timeout being thrown. The only thing that I can think of is that the
Activation.checkStatementValidity() is seeing the statement as valid and not
going to try to recompile it. Why it occurred in my case where I see the
"isValid" member set to false, I don't know. I will try to hitch up the
debugger and try to determine the difference so that I can understand it better.
I do believe that the code should not swallow and exception such as a lock
timeout being reported regardless if the statement is no longer reporting to be
valid. This is definitely a condition that will cause an infinite loop of
processing.
Again, I appreciate the help and your time. If I gain an understanding of how
the condition is triggered, I will look to write a test case for it. I am
reading the Derby testing docs that are relating to use JUnit which I assume is
the correct path for newer test cases, correct?
> Derby threads hanging when using ClientXADataSource and a deadlock or lock
> timeout occurs
> -----------------------------------------------------------------------------------------
>
> Key: DERBY-5552
> URL: https://issues.apache.org/jira/browse/DERBY-5552
> Project: Derby
> Issue Type: Bug
> Components: Network Server
> Affects Versions: 10.8.1.2
> Environment: Solaris 10, Glassfish V2.1.1,
> Reporter: Brett Bergquist
> Priority: Blocker
> Attachments: appserverstack.txt, client.tar.Z, derby.log,
> derbystackatshutdown.txt, execute.patch, transactionsleft.txt
>
>
> The issue arrives when multiple XA transactions are done in parallel and
> there is either a lock timeout or a lock deadlock detected. When this
> happens the connection is leaked in the Glassfish connection pool and the
> client thread hangs in
> "org.apache.derby.client.netReply.fill(Reply.java:172)".
> Shutting down the app server fails because the thread has a lock in
> "org.apache.derby.client.net.NetConnection40" and another task is calling
> "org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214)"
> which is waiting for the lock.
> Killing the appsever using "kill" and then attempting to shutdown Derby
> network server causes the Network Server to hang. One of the threads hangs
> waiting for a lock at
> "org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525)"
> and the "main" thread has this locked at
> "org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242)"
> and it itself is waiting for a lock which belongs to a thread that is stuck
> at
> "org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118)
> which is in the TIMED_WAITING state.
> Only by killing the Network Server using "kill" is possible at this point.
> There are transactions left even though all clients have been removed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira