I opened DERBY-5552 https://issues.apache.org/jira/browse/DERBY-5552
I attached client side traces using traceLevel=2145 (TRACE_XA_CALLS|TRACE_PROTOCOL_FLOWS|TRACE_CONNECTS|TRACE_CONNECTION_CALL). I don't know if more is needed. I have sever side traces but it is 93Mb uncompressed and 13Mb compressed. Is there something I should look for in there to narrow down which trace files to include and upload? I have attached jstack traces of the application server and the Derby Network Server at shutdown showing the hung threads. I have attached a the output of "select * from syscs_diag.transaction_table" when there are no clients and no other database action showing transactions that are still present. I am trying to narrow down a test case better but have not been able to at this point. This is repeatable with my J2EE application every time however with the test setup that I have. Any further areas to look at with a debugger or outputting more tracing information will be greatly appreciated. From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net] Sent: Wednesday, December 21, 2011 7:25 PM To: derby-dev@db.apache.org Subject: Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1 On 12/21/2011 3:14 PM, Bergquist, Brett wrote: Will get to this tomorrow but I do see one comment in the code that I don't understand: In DRDAConnThread.java, I see: if (severity > CodePoint.SVRCOD_ERROR) { // For a session ending error > CodePoint.SRVCOD_ERROR you cannot // send a SQLERRRM. A CMDCHKRM is required. In XA if there is a // lock timeout it ends the whole session. I am not sure this // is the correct behaviour but if it occurs we have to send // a CMDCHKRM instead of SQLERRM writeCMDCHKRM(severity); } So what does the comment "In XA if there is a lock timeout it ends the whole session" refer to. Why would a lock timeout be any different than any other standard database error. It is like this is hinting at what is happening. This is a real XA transaction. What I see is that after the timeout is hit (I see it hit in Timeout.java) the error is propagated to the app server. The app server then attempts to get the error text (I don't have the code handy) which attempts to send a request back to the Derby. This then fails with a No Connection error being returned back from Derby. It is as if after this error, the connection between the app server and Derby is no longer once there this is hit. I agree that would not be the correct behavior if a lock timeout killed the session. As this is a server side comment it would imply that this is a problem with embedded as well as well, but hard to believe it would not have been exposed before now. Thanks for working on reproduction for this. I don't see the comment in the original code import but the annotation is not clear as it mentions the back out of another fix, so I am not sure who first noticed this behavior.