RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs
Well, I see that at the start of the execution, even on the next retry, the isValid is true for the statement. Something comes along and changes the state of isValid. I am having trouble finding out what with the debugger because there are quite a few threads running (statistics daemon for example) and the makeInvalid method is being invoked from various places and threads. So catching the correct one is being difficult to I am going to drop back to the old printf method and try to track the thread that sets the isValid to true and the thread that calls makeInvalid and try to match them up. The fix definitely fixed my problem. There is something in my code that is triggering the deadlock and it is that statement that you mention below that is the one that is causing problems. But I want to understand why the fix is needed and how it gets into this state in the first place. The I will look at my code to figure out why this statement causes a problem. -Original Message- From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org] Sent: Thursday, December 22, 2011 8:20 PM To: derby-dev@db.apache.org Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs [ https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216 ] Kathey Marsden commented on DERBY-5552: --- Yes, definitely you want your test to be the new Junit style so that is the place to look. I think I have solved the mystery of the missing XA client traces and filed DERBY-5553 for that issue. I am trying to think of a scenario where we we execute a valid statement would get a lock timeout and then at the time of processing the exception the statement would have become invalid. One reason for a statement becoming invalid would be for instance if a table had been altered, for example if we had a prepared statement SELECT * FROM TAB and then a column was added or dropped. In that scenario the statement would be recompiled long before the lock timeout occurred. If the problem statement is the one with the lock timeout it would seem to be: SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER t2, CORE_V1.DEVICE_ENTITY_USER t1, CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER )) AND (((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 'PlpmChassis9145E10GUser'))) AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 parameters begin parameter #1: 249360 :end parameter ERROR 40XL2: A lock could not be obtained within the time requested. The lockTable dump is: Thu Dec 22 07:03:07 PST 2011 I am not sure how that could have gone invalid between the initial execution and the lock time out. Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs - Key: DERBY-5552 URL: https://issues.apache.org/jira/browse/DERBY-5552 Project: Derby Issue Type: Bug Components: Network Server Affects Versions: 10.8.1.2 Environment: Solaris 10, Glassfish V2.1.1, Reporter: Brett Bergquist Priority: Blocker Attachments: appserverstack.txt, client.tar.Z, derby.log, derbystackatshutdown.txt, execute.patch, transactionsleft.txt The issue arrives when multiple XA transactions are done in parallel and there is either a lock timeout or a lock deadlock detected. When this happens the connection is leaked in the Glassfish connection pool and the client thread hangs in org.apache.derby.client.netReply.fill(Reply.java:172). Shutting down the app server fails because the thread has a lock in org.apache.derby.client.net.NetConnection40 and another task is calling org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214) which is waiting for the lock. Killing the appsever using kill and then attempting to shutdown Derby network server causes the Network Server to hang. One of the threads hangs waiting for a lock at org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525) and the main thread has this locked at org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242) and it itself is waiting for a lock which belongs to a thread that is stuck at org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118) which is in the TIMED_WAITING state. Only by killing the Network Server using kill is possible at this point. There are transactions left even though all clients have been removed. -- This message is automatically
RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs
It is the IndexStatisticsDaemonImpl that is invalidating the statement. To verify this, I got my test case where it was in this loop. With my printouts I was seeing that the isValid was being set by the thread that was stuck right after the timeout occurred and then I immediately saw makeInvalid being called with action = 40 (UPDATE_STATISTICS). So I attached the debugger to the database engine and set a breakpoint in IndexStatisticsDaemonImpl and then set daemonDisabled to be false. Immediately the lock timeout got reported and the loop terminated and my test case continued on the way it should. So the IndexStatisticsDaemonImpl has the ability to invalidate statements in use by other threads. In this case, it is being invoked right after lock timeout occurs and before the lock timeout exception is being processed. The code sees that the statement is invalid and swallows the lock exception and sets up to recompile and run the statement again. So the fix that I proposed and supplied a patch will handle the code not swallowing the exception, but I am now concerned if the IndexStatisticsDaemonImpl might have another problem if it is reaching into other threads :( -Original Message- From: Bergquist, Brett [mailto:bbergqu...@canoga.com] Sent: Friday, December 23, 2011 10:54 AM To: derby-dev@db.apache.org Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs Well, I see that at the start of the execution, even on the next retry, the isValid is true for the statement. Something comes along and changes the state of isValid. I am having trouble finding out what with the debugger because there are quite a few threads running (statistics daemon for example) and the makeInvalid method is being invoked from various places and threads. So catching the correct one is being difficult to I am going to drop back to the old printf method and try to track the thread that sets the isValid to true and the thread that calls makeInvalid and try to match them up. The fix definitely fixed my problem. There is something in my code that is triggering the deadlock and it is that statement that you mention below that is the one that is causing problems. But I want to understand why the fix is needed and how it gets into this state in the first place. The I will look at my code to figure out why this statement causes a problem. -Original Message- From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org] Sent: Thursday, December 22, 2011 8:20 PM To: derby-dev@db.apache.org Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs [ https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216 ] Kathey Marsden commented on DERBY-5552: --- Yes, definitely you want your test to be the new Junit style so that is the place to look. I think I have solved the mystery of the missing XA client traces and filed DERBY-5553 for that issue. I am trying to think of a scenario where we we execute a valid statement would get a lock timeout and then at the time of processing the exception the statement would have become invalid. One reason for a statement becoming invalid would be for instance if a table had been altered, for example if we had a prepared statement SELECT * FROM TAB and then a column was added or dropped. In that scenario the statement would be recompiled long before the lock timeout occurred. If the problem statement is the one with the lock timeout it would seem to be: SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER t2, CORE_V1.DEVICE_ENTITY_USER t1, CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER )) AND (((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 'PlpmChassis9145E10GUser'))) AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 parameters begin parameter #1: 249360 :end parameter ERROR 40XL2: A lock could not be obtained within the time requested. The lockTable dump is: Thu Dec 22 07:03:07 PST 2011 I am not sure how that could have gone invalid between the initial execution and the lock time out. Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs - Key: DERBY-5552 URL: https://issues.apache.org/jira/browse/DERBY-5552 Project: Derby Issue Type: Bug Components: Network Server Affects Versions: 10.8.1.2 Environment: Solaris 10, Glassfish V2.1.1, Reporter: Brett Bergquist Priority: Blocker Attachments
Re: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs
On 23.12.2011 17:49, Bergquist, Brett wrote: It is the IndexStatisticsDaemonImpl that is invalidating the statement. To verify this, I got my test case where it was in this loop. With my printouts I was seeing that the isValid was being set by the thread that was stuck right after the timeout occurred and then I immediately saw makeInvalid being called with action = 40 (UPDATE_STATISTICS). So I attached the debugger to the database engine and set a breakpoint in IndexStatisticsDaemonImpl and then set daemonDisabled to be false. Immediately the lock timeout got reported and the loop terminated and my test case continued on the way it should. I haven't followed this closely, just a few quick questions/observations to check if I'm understanding this correctly: o Does disabling the istat daemon suffice as a workaround? o With the istat deamon disabled, are you still seeing timeouts? Thanks, -- Kristian So the IndexStatisticsDaemonImpl has the ability to invalidate statements in use by other threads. In this case, it is being invoked right after lock timeout occurs and before the lock timeout exception is being processed. The code sees that the statement is invalid and swallows the lock exception and sets up to recompile and run the statement again. So the fix that I proposed and supplied a patch will handle the code not swallowing the exception, but I am now concerned if the IndexStatisticsDaemonImpl might have another problem if it is reaching into other threads :( -Original Message- From: Bergquist, Brett [mailto:bbergqu...@canoga.com] Sent: Friday, December 23, 2011 10:54 AM To: derby-dev@db.apache.org Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs Well, I see that at the start of the execution, even on the next retry, the isValid is true for the statement. Something comes along and changes the state of isValid. I am having trouble finding out what with the debugger because there are quite a few threads running (statistics daemon for example) and the makeInvalid method is being invoked from various places and threads. So catching the correct one is being difficult to I am going to drop back to the old printf method and try to track the thread that sets the isValid to true and the thread that calls makeInvalid and try to match them up. The fix definitely fixed my problem. There is something in my code that is triggering the deadlock and it is that statement that you mention below that is the one that is causing problems. But I want to understand why the fix is needed and how it gets into this state in the first place. The I will look at my code to figure out why this statement causes a problem. -Original Message- From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org] Sent: Thursday, December 22, 2011 8:20 PM To: derby-dev@db.apache.org Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs [ https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216 ] Kathey Marsden commented on DERBY-5552: --- Yes, definitely you want your test to be the new Junit style so that is the place to look. I think I have solved the mystery of the missing XA client traces and filed DERBY-5553 for that issue. I am trying to think of a scenario where we we execute a valid statement would get a lock timeout and then at the time of processing the exception the statement would have become invalid. One reason for a statement becoming invalid would be for instance if a table had been altered, for example if we had a prepared statement SELECT * FROM TAB and then a column was added or dropped. In that scenario the statement would be recompiled long before the lock timeout occurred. If the problem statement is the one with the lock timeout it would seem to be: SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER t2, CORE_V1.DEVICE_ENTITY_USER t1, CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER )) AND (((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 'PlpmChassis9145E10GUser'))) AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 parameters begin parameter #1: 249360 :end parameter ERROR 40XL2: A lock could not be obtained within the time requested. The lockTable dump is: Thu Dec 22 07:03:07 PST 2011 I am not sure how that could have gone invalid between the initial execution and the lock time out. Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs - Key: DERBY-5552 URL: https
RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs
If I disable the istat daemon, I still the the timeout but now the error is reported as it should and the transaction is aborted and my code continues. With the itstat deamon enabled, then the lock timeout is discovered and signaled by exception being thrown but between the time it gets caught and processed, the istat daemon has set the isValid property of the statement to false. The original lock timeout exception is caught but then the Activation.checkStatementValidity() is called in the exception handler. This finds that the statement is not valid (isValid == false) and throws its own exception indicating that a recompilation is required, swallowing the original lock timeout exception in the process. This causes the existing statement to be rebuilt and re-executed but the locks are still present, the lock timeout will occur again, the lock timeout exception will be thrown, the istat daemon will set the isValid to false again, etc. The process just continues forever. The patch that I attached to the bug only calls Activation.checkStatementValidity() in the exception handler if the caught exception is not set to REPORT_ALWAYS. This seem okay and to make sense because if the code is handling an exception that is to be REPORT_ALWAYS, there is no sense looking for a statement that is invalid; just report the exception. Brett From: Kristian Waagan [kristian.waa...@oracle.com] Sent: Friday, December 23, 2011 5:08 PM To: derby-dev@db.apache.org Subject: Re: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs On 23.12.2011 17:49, Bergquist, Brett wrote: It is the IndexStatisticsDaemonImpl that is invalidating the statement. To verify this, I got my test case where it was in this loop. With my printouts I was seeing that the isValid was being set by the thread that was stuck right after the timeout occurred and then I immediately saw makeInvalid being called with action = 40 (UPDATE_STATISTICS). So I attached the debugger to the database engine and set a breakpoint in IndexStatisticsDaemonImpl and then set daemonDisabled to be false. Immediately the lock timeout got reported and the loop terminated and my test case continued on the way it should. I haven't followed this closely, just a few quick questions/observations to check if I'm understanding this correctly: o Does disabling the istat daemon suffice as a workaround? o With the istat deamon disabled, are you still seeing timeouts? Thanks, -- Kristian So the IndexStatisticsDaemonImpl has the ability to invalidate statements in use by other threads. In this case, it is being invoked right after lock timeout occurs and before the lock timeout exception is being processed. The code sees that the statement is invalid and swallows the lock exception and sets up to recompile and run the statement again. So the fix that I proposed and supplied a patch will handle the code not swallowing the exception, but I am now concerned if the IndexStatisticsDaemonImpl might have another problem if it is reaching into other threads :( -Original Message- From: Bergquist, Brett [mailto:bbergqu...@canoga.com] Sent: Friday, December 23, 2011 10:54 AM To: derby-dev@db.apache.org Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs Well, I see that at the start of the execution, even on the next retry, the isValid is true for the statement. Something comes along and changes the state of isValid. I am having trouble finding out what with the debugger because there are quite a few threads running (statistics daemon for example) and the makeInvalid method is being invoked from various places and threads. So catching the correct one is being difficult to I am going to drop back to the old printf method and try to track the thread that sets the isValid to true and the thread that calls makeInvalid and try to match them up. The fix definitely fixed my problem. There is something in my code that is triggering the deadlock and it is that statement that you mention below that is the one that is causing problems. But I want to understand why the fix is needed and how it gets into this state in the first place. The I will look at my code to figure out why this statement causes a problem. -Original Message- From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org] Sent: Thursday, December 22, 2011 8:20 PM To: derby-dev@db.apache.org Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs [ https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216 ] Kathey