[ https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175192#comment-13175192 ]
Brett Bergquist commented on DERBY-5552: ---------------------------------------- I guess I am confused as well Kathey as I had the debugger attached and do see it going through the XA code in Derby on the client side. The application server is setup with the ClientXADataSource and I do see it calling xa.commit and xa.end for example. The ClientXADataSource is required otherwise the error: Local transaction already has 1 non-XA Resource: cannot add more resources. occurs. So although there is one database (Derby), it is using XA. The database is being accessed through EJB's and through Eclipselink and also through a custom JCA interface driving Message Driven Beans. For the test case, I had to limit things to get my sanity. So I stopped as much access to the database as I could but still trigger the problem. Eventually I got down to one thread of control being processed by EJB's which do start new transactions. Even with this one access going on, I hit the lockup issue that I posted. That is when I found the issue that I mention. So whether or not this is the real issue, I don't know but when I tried to get as simple a condition as possible, I ran into this. Thinking now, I don't understand why this would not be hit in a normal case of a lock timeout being thrown. The only thing that I can think of is that the Activation.checkStatementValidity() is seeing the statement as valid and not going to try to recompile it. Why it occurred in my case where I see the "isValid" member set to false, I don't know. I will try to hitch up the debugger and try to determine the difference so that I can understand it better. I do believe that the code should not swallow and exception such as a lock timeout being reported regardless if the statement is no longer reporting to be valid. This is definitely a condition that will cause an infinite loop of processing. Again, I appreciate the help and your time. If I gain an understanding of how the condition is triggered, I will look to write a test case for it. I am reading the Derby testing docs that are relating to use JUnit which I assume is the correct path for newer test cases, correct? > Derby threads hanging when using ClientXADataSource and a deadlock or lock > timeout occurs > ----------------------------------------------------------------------------------------- > > Key: DERBY-5552 > URL: https://issues.apache.org/jira/browse/DERBY-5552 > Project: Derby > Issue Type: Bug > Components: Network Server > Affects Versions: 10.8.1.2 > Environment: Solaris 10, Glassfish V2.1.1, > Reporter: Brett Bergquist > Priority: Blocker > Attachments: appserverstack.txt, client.tar.Z, derby.log, > derbystackatshutdown.txt, execute.patch, transactionsleft.txt > > > The issue arrives when multiple XA transactions are done in parallel and > there is either a lock timeout or a lock deadlock detected. When this > happens the connection is leaked in the Glassfish connection pool and the > client thread hangs in > "org.apache.derby.client.netReply.fill(Reply.java:172)". > Shutting down the app server fails because the thread has a lock in > "org.apache.derby.client.net.NetConnection40" and another task is calling > "org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214)" > which is waiting for the lock. > Killing the appsever using "kill" and then attempting to shutdown Derby > network server causes the Network Server to hang. One of the threads hangs > waiting for a lock at > "org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525)" > and the "main" thread has this locked at > "org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242)" > and it itself is waiting for a lock which belongs to a thread that is stuck > at > "org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118) > which is in the TIMED_WAITING state. > Only by killing the Network Server using "kill" is possible at this point. > There are transactions left even though all clients have been removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira