RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs

2011-12-23 Thread Bergquist, Brett
Well, I see that at the start of the execution, even on the next retry, the 
isValid is true for the statement.  Something comes along and changes the state 
of isValid.  I am having trouble finding out what with the debugger because 
there are quite a few threads running (statistics daemon for example) and the 
makeInvalid method is being invoked from various places and threads.   So 
catching the correct one is being difficult to I am going to drop back to the 
old printf method and try to track the thread that sets the isValid to true 
and the thread that calls makeInvalid and try to match them up.

The fix definitely fixed my problem.  There is something in my code that is 
triggering the deadlock and it is that statement that you mention below that is 
the one that is causing problems.  But I want to understand why the fix is 
needed and how it gets into this state in the first place.  The I will look at 
my code to figure out why this statement causes a problem.

-Original Message-
From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org] 
Sent: Thursday, December 22, 2011 8:20 PM
To: derby-dev@db.apache.org
Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs


[ 
https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216
 ] 

Kathey Marsden commented on DERBY-5552:
---

Yes, definitely you want your test to be the new Junit style so that is the 
place to look.

I think I have solved the mystery of the missing XA client  traces and filed 
DERBY-5553 for that issue.

I am trying to think of a scenario where we we execute a valid statement would 
get a lock timeout and then at the time of processing the exception the 
statement would have become invalid.  One reason for a statement becoming 
invalid would be for instance if a table had been altered, for example if we 
had a prepared statement SELECT * FROM TAB and then a column was added or 
dropped.  In that scenario the statement would be recompiled long before the 
lock timeout occurred.  If the problem statement is the one with the lock 
timeout it would seem to be:

 SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, 
PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER t2, CORE_V1.DEVICE_ENTITY_USER t1, 
CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER )) AND 
(((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 
'PlpmChassis9145E10GUser'))) AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 
parameters begin parameter #1: 249360 :end parameter 
ERROR 40XL2: A lock could not be obtained within the time requested.  The 
lockTable dump is: 
Thu Dec 22 07:03:07 PST 2011


I am not sure how that could have gone invalid between the initial execution 
and the lock time out.



 Derby threads hanging when using ClientXADataSource and a deadlock or lock 
 timeout occurs
 -

 Key: DERBY-5552
 URL: https://issues.apache.org/jira/browse/DERBY-5552
 Project: Derby
  Issue Type: Bug
  Components: Network Server
Affects Versions: 10.8.1.2
 Environment: Solaris 10, Glassfish V2.1.1,
Reporter: Brett Bergquist
Priority: Blocker
 Attachments: appserverstack.txt, client.tar.Z, derby.log, 
 derbystackatshutdown.txt, execute.patch, transactionsleft.txt


 The issue arrives when multiple XA transactions are done in parallel and 
 there is either a lock timeout or a lock deadlock detected.  When this 
 happens the connection is leaked in the Glassfish connection pool and the 
 client thread hangs in 
 org.apache.derby.client.netReply.fill(Reply.java:172).  
 Shutting down the app server fails because the thread has a lock in 
 org.apache.derby.client.net.NetConnection40 and another task is calling 
 org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214)
  which is waiting for the lock.
 Killing the appsever using kill and then attempting to shutdown Derby 
 network server causes the Network Server to hang.  One of the threads hangs 
 waiting for a lock at 
 org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525)
  and the main thread has this locked at 
 org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242)
  and it itself is waiting for a lock which belongs to a thread that is stuck 
 at 
 org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118)
  which is in the TIMED_WAITING state.
 Only by killing the Network Server using kill is possible at this point.
 There are transactions left even though all clients have been removed.  

--
This message is automatically 

RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs

2011-12-23 Thread Bergquist, Brett
It is the IndexStatisticsDaemonImpl that is invalidating the statement.  To 
verify this, I got my test case where it was in this loop.  With my printouts I 
was seeing that the isValid was being set by the thread that was stuck right 
after the timeout occurred and then I immediately saw makeInvalid being called 
with action = 40 (UPDATE_STATISTICS).  So I attached the debugger to the 
database engine and set a breakpoint in IndexStatisticsDaemonImpl  and then set 
 daemonDisabled to be false.   Immediately the lock timeout got reported 
and the loop terminated and my test case continued on the way it should.

So the IndexStatisticsDaemonImpl has the ability to invalidate statements in 
use by other threads.  In this case, it is being invoked right after lock 
timeout occurs and before the lock timeout exception is being processed.  The 
code sees that the statement is invalid and swallows the lock exception and 
sets up to recompile and run the statement again.

So the fix that I proposed and supplied a patch will handle the code not 
swallowing the exception, but I am now concerned if the 
IndexStatisticsDaemonImpl might have another problem if it is reaching into 
other threads :(

-Original Message-
From: Bergquist, Brett [mailto:bbergqu...@canoga.com] 
Sent: Friday, December 23, 2011 10:54 AM
To: derby-dev@db.apache.org
Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs

Well, I see that at the start of the execution, even on the next retry, the 
isValid is true for the statement.  Something comes along and changes the state 
of isValid.  I am having trouble finding out what with the debugger because 
there are quite a few threads running (statistics daemon for example) and the 
makeInvalid method is being invoked from various places and threads.   So 
catching the correct one is being difficult to I am going to drop back to the 
old printf method and try to track the thread that sets the isValid to true 
and the thread that calls makeInvalid and try to match them up.

The fix definitely fixed my problem.  There is something in my code that is 
triggering the deadlock and it is that statement that you mention below that is 
the one that is causing problems.  But I want to understand why the fix is 
needed and how it gets into this state in the first place.  The I will look at 
my code to figure out why this statement causes a problem.

-Original Message-
From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org] 
Sent: Thursday, December 22, 2011 8:20 PM
To: derby-dev@db.apache.org
Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs


[ 
https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216
 ] 

Kathey Marsden commented on DERBY-5552:
---

Yes, definitely you want your test to be the new Junit style so that is the 
place to look.

I think I have solved the mystery of the missing XA client  traces and filed 
DERBY-5553 for that issue.

I am trying to think of a scenario where we we execute a valid statement would 
get a lock timeout and then at the time of processing the exception the 
statement would have become invalid.  One reason for a statement becoming 
invalid would be for instance if a table had been altered, for example if we 
had a prepared statement SELECT * FROM TAB and then a column was added or 
dropped.  In that scenario the statement would be recompiled long before the 
lock timeout occurred.  If the problem statement is the one with the lock 
timeout it would seem to be:

 SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, 
PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER t2, CORE_V1.DEVICE_ENTITY_USER t1, 
CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER )) AND 
(((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 
'PlpmChassis9145E10GUser'))) AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 
parameters begin parameter #1: 249360 :end parameter 
ERROR 40XL2: A lock could not be obtained within the time requested.  The 
lockTable dump is: 
Thu Dec 22 07:03:07 PST 2011


I am not sure how that could have gone invalid between the initial execution 
and the lock time out.



 Derby threads hanging when using ClientXADataSource and a deadlock or lock 
 timeout occurs
 -

 Key: DERBY-5552
 URL: https://issues.apache.org/jira/browse/DERBY-5552
 Project: Derby
  Issue Type: Bug
  Components: Network Server
Affects Versions: 10.8.1.2
 Environment: Solaris 10, Glassfish V2.1.1,
Reporter: Brett Bergquist
Priority: Blocker
 Attachments

Re: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs

2011-12-23 Thread Kristian Waagan

On 23.12.2011 17:49, Bergquist, Brett wrote:

It is the IndexStatisticsDaemonImpl that is invalidating the statement.  To verify this, I got my 
test case where it was in this loop.  With my printouts I was seeing that the isValid was being set 
by the thread that was stuck right after the timeout occurred and then I immediately saw 
makeInvalid being called with action = 40 (UPDATE_STATISTICS).  So I attached the debugger to the 
database engine and set a breakpoint in IndexStatisticsDaemonImpl  and then set  
daemonDisabled to be false.   Immediately the lock timeout got reported and 
the loop terminated and my test case continued on the way it should.


I haven't followed this closely, just a few quick questions/observations 
to check if I'm understanding this correctly:

 o Does disabling the istat daemon suffice as a workaround?
 o With the istat deamon disabled, are you still seeing timeouts?


Thanks,
--
Kristian



So the IndexStatisticsDaemonImpl has the ability to invalidate statements in 
use by other threads.  In this case, it is being invoked right after lock 
timeout occurs and before the lock timeout exception is being processed.  The 
code sees that the statement is invalid and swallows the lock exception and 
sets up to recompile and run the statement again.

So the fix that I proposed and supplied a patch will handle the code not 
swallowing the exception, but I am now concerned if the 
IndexStatisticsDaemonImpl might have another problem if it is reaching into 
other threads :(

-Original Message-
From: Bergquist, Brett [mailto:bbergqu...@canoga.com]
Sent: Friday, December 23, 2011 10:54 AM
To: derby-dev@db.apache.org
Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs

Well, I see that at the start of the execution, even on the next retry, the isValid is 
true for the statement.  Something comes along and changes the state of isValid.  I am 
having trouble finding out what with the debugger because there are quite a few threads 
running (statistics daemon for example) and the makeInvalid method is being invoked from 
various places and threads.   So catching the correct one is being difficult to I am 
going to drop back to the old printf method and try to track the thread that 
sets the isValid to true and the thread that calls makeInvalid and try to match them up.

The fix definitely fixed my problem.  There is something in my code that is 
triggering the deadlock and it is that statement that you mention below that is 
the one that is causing problems.  But I want to understand why the fix is 
needed and how it gets into this state in the first place.  The I will look at 
my code to figure out why this statement causes a problem.

-Original Message-
From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org]
Sent: Thursday, December 22, 2011 8:20 PM
To: derby-dev@db.apache.org
Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs


 [ 
https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216
 ]

Kathey Marsden commented on DERBY-5552:
---

Yes, definitely you want your test to be the new Junit style so that is the 
place to look.

I think I have solved the mystery of the missing XA client  traces and filed 
DERBY-5553 for that issue.

I am trying to think of a scenario where we we execute a valid statement would get a lock 
timeout and then at the time of processing the exception the statement would have become 
invalid.  One reason for a statement becoming invalid would be for instance if a table 
had been altered, for example if we had a prepared statement SELECT * FROM 
TAB and then a column was added or dropped.  In that scenario the statement would 
be recompiled long before the lock timeout occurred.  If the problem statement is the one 
with the lock timeout it would seem to be:

  SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, 
PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER t2, CORE_V1.DEVICE_ENTITY_USER t1, 
CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER )) AND 
(((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 
'PlpmChassis9145E10GUser'))) AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 
parameters begin parameter #1: 249360 :end parameter
ERROR 40XL2: A lock could not be obtained within the time requested.  The 
lockTable dump is:
Thu Dec 22 07:03:07 PST 2011


I am not sure how that could have gone invalid between the initial execution 
and the lock time out.




Derby threads hanging when using ClientXADataSource and a deadlock or lock 
timeout occurs
-

 Key: DERBY-5552
 URL: https

RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs

2011-12-23 Thread Bergquist, Brett
If I disable the istat daemon, I still the the timeout but now the error is 
reported as it should and the transaction is aborted and my code continues.  
With the itstat deamon enabled, then the lock timeout is discovered and 
signaled by exception being thrown but between the time it gets caught and 
processed, the istat daemon has set the isValid property of the statement to 
false.  The original lock timeout exception is caught but then the 

Activation.checkStatementValidity()

is called in the exception handler.  This finds that the statement is not valid 
(isValid == false) and throws its own exception indicating that a recompilation 
is required, swallowing the original lock timeout exception in the process.  
This causes the existing statement to be rebuilt and re-executed but the locks 
are still present, the lock timeout will occur again, the lock timeout 
exception will be thrown, the istat daemon will set the isValid to false 
again, etc.  The process just continues forever.

The patch that I attached to the bug only calls

Activation.checkStatementValidity()

in the exception handler if the caught exception is not set to REPORT_ALWAYS.  
This seem okay and to make sense because if the code is handling an exception 
that is to be REPORT_ALWAYS, there is no sense looking for a statement that is 
invalid; just report the exception.

Brett


From: Kristian Waagan [kristian.waa...@oracle.com]
Sent: Friday, December 23, 2011 5:08 PM
To: derby-dev@db.apache.org
Subject: Re: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs

On 23.12.2011 17:49, Bergquist, Brett wrote:
 It is the IndexStatisticsDaemonImpl that is invalidating the statement.  To 
 verify this, I got my test case where it was in this loop.  With my printouts 
 I was seeing that the isValid was being set by the thread that was stuck 
 right after the timeout occurred and then I immediately saw makeInvalid being 
 called with action = 40 (UPDATE_STATISTICS).  So I attached the debugger to 
 the database engine and set a breakpoint in IndexStatisticsDaemonImpl  and 
 then set  daemonDisabled to be false.   Immediately the lock timeout got 
 reported and the loop terminated and my test case continued on the way it 
 should.

I haven't followed this closely, just a few quick questions/observations
to check if I'm understanding this correctly:
  o Does disabling the istat daemon suffice as a workaround?
  o With the istat deamon disabled, are you still seeing timeouts?


Thanks,
--
Kristian


 So the IndexStatisticsDaemonImpl has the ability to invalidate statements in 
 use by other threads.  In this case, it is being invoked right after lock 
 timeout occurs and before the lock timeout exception is being processed.  The 
 code sees that the statement is invalid and swallows the lock exception and 
 sets up to recompile and run the statement again.

 So the fix that I proposed and supplied a patch will handle the code not 
 swallowing the exception, but I am now concerned if the 
 IndexStatisticsDaemonImpl might have another problem if it is reaching into 
 other threads :(

 -Original Message-
 From: Bergquist, Brett [mailto:bbergqu...@canoga.com]
 Sent: Friday, December 23, 2011 10:54 AM
 To: derby-dev@db.apache.org
 Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
 ClientXADataSource and a deadlock or lock timeout occurs

 Well, I see that at the start of the execution, even on the next retry, the 
 isValid is true for the statement.  Something comes along and changes the 
 state of isValid.  I am having trouble finding out what with the debugger 
 because there are quite a few threads running (statistics daemon for example) 
 and the makeInvalid method is being invoked from various places and threads.  
  So catching the correct one is being difficult to I am going to drop back to 
 the old printf method and try to track the thread that sets the isValid to 
 true and the thread that calls makeInvalid and try to match them up.

 The fix definitely fixed my problem.  There is something in my code that is 
 triggering the deadlock and it is that statement that you mention below that 
 is the one that is causing problems.  But I want to understand why the fix is 
 needed and how it gets into this state in the first place.  The I will look 
 at my code to figure out why this statement causes a problem.

 -Original Message-
 From: Kathey Marsden (Commented) (JIRA) [mailto:j...@apache.org]
 Sent: Thursday, December 22, 2011 8:20 PM
 To: derby-dev@db.apache.org
 Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
 ClientXADataSource and a deadlock or lock timeout occurs


  [ 
 https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175216#comment-13175216
  ]

 Kathey