On 23.12.2011 17:49, Bergquist, Brett wrote:
It is the IndexStatisticsDaemonImpl that is invalidating the statement.  To verify this, I got my 
test case where it was in this loop.  With my printouts I was seeing that the isValid was being set 
by the thread that was stuck right after the timeout occurred and then I immediately saw 
makeInvalid being called with action = 40 (UPDATE_STATISTICS).  So I attached the debugger to the 
database engine and set a breakpoint in IndexStatisticsDaemonImpl  and then set  
"daemonDisabled" to be "false".   Immediately the lock timeout got reported and 
the loop terminated and my test case continued on the way it should.

I haven't followed this closely, just a few quick questions/observations to check if I'm understanding this correctly:
 o Does disabling the istat daemon suffice as a workaround?
 o With the istat deamon disabled, are you still seeing timeouts?


Thanks,
--
Kristian


So the IndexStatisticsDaemonImpl has the ability to invalidate statements in 
use by other threads.  In this case, it is being invoked right after lock 
timeout occurs and before the lock timeout exception is being processed.  The 
code sees that the statement is invalid and swallows the lock exception and 
sets up to recompile and run the statement again.

So the fix that I proposed and supplied a patch will handle the code not 
swallowing the exception, but I am now concerned if the 
IndexStatisticsDaemonImpl might have another problem if it is reaching into 
other threads :(

-----Original Message-----
From: Bergquist, Brett [mailto:[email protected]]
Sent: Friday, December 23, 2011 10:54 AM
To: [email protected]
Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs

Well, I see that at the start of the execution, even on the next retry, the isValid is 
true for the statement.  Something comes along and changes the state of isValid.  I am 
having trouble finding out what with the debugger because there are quite a few threads 
running (statistics daemon for example) and the makeInvalid method is being invoked from 
various places and threads.   So catching the correct one is being difficult to I am 
going to drop back to the old "printf" method and try to track the thread that 
sets the isValid to true and the thread that calls makeInvalid and try to match them up.

The fix definitely fixed my problem.  There is something in my code that is 
triggering the deadlock and it is that statement that you mention below that is 
the one that is causing problems.  But I want to understand why the fix is 
needed and how it gets into this state in the first place.  The I will look at 
my code to figure out why this statement causes a problem.

-----Original Message-----
From: Kathey Marsden (Commented) (JIRA) [mailto:[email protected]]
Sent: Thursday, December 22, 2011 8:20 PM
To: [email protected]
Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using 
ClientXADataSource and a deadlock or lock timeout occurs


     [ 
https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175216#comment-13175216
 ]

Kathey Marsden commented on DERBY-5552:
---------------------------------------

Yes, definitely you want your test to be the new Junit style so that is the 
place to look.

I think I have solved the mystery of the missing XA client  traces and filed 
DERBY-5553 for that issue.

I am trying to think of a scenario where we we execute a valid statement would get a lock 
timeout and then at the time of processing the exception the statement would have become 
invalid.  One reason for a statement becoming invalid would be for instance if a table 
had been altered, for example if we had a prepared statement "SELECT * FROM 
TAB" and then a column was added or dropped.  In that scenario the statement would 
be recompiled long before the lock timeout occurred.  If the problem statement is the one 
with the lock timeout it would seem to be:

  SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, 
PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER t2, CORE_V1.DEVICE_ENTITY_USER t1, 
CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER )) AND 
(((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 
'PlpmChassis9145E10GUser'))) AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 
parameters begin parameter #1: 249360 :end parameter
ERROR 40XL2: A lock could not be obtained within the time requested.  The 
lockTable dump is:
Thu Dec 22 07:03:07 PST 2011


I am not sure how that could have gone invalid between the initial execution 
and the lock time out.



Derby threads hanging when using ClientXADataSource and a deadlock or lock 
timeout occurs
-----------------------------------------------------------------------------------------

                 Key: DERBY-5552
                 URL: https://issues.apache.org/jira/browse/DERBY-5552
             Project: Derby
          Issue Type: Bug
          Components: Network Server
    Affects Versions: 10.8.1.2
         Environment: Solaris 10, Glassfish V2.1.1,
            Reporter: Brett Bergquist
            Priority: Blocker
         Attachments: appserverstack.txt, client.tar.Z, derby.log, 
derbystackatshutdown.txt, execute.patch, transactionsleft.txt


The issue arrives when multiple XA transactions are done in parallel and there is either 
a lock timeout or a lock deadlock detected.  When this happens the connection is leaked 
in the Glassfish connection pool and the client thread hangs in 
"org.apache.derby.client.netReply.fill(Reply.java:172)".
Shutting down the app server fails because the thread has a lock in 
"org.apache.derby.client.net.NetConnection40" and another task is calling 
"org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214)" 
which is waiting for the lock.
Killing the appsever using "kill" and then attempting to shutdown Derby network server causes the Network Server 
to hang.  One of the threads hangs waiting for a lock at 
"org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525)" 
and the "main" thread has this locked at 
"org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242)" and it 
itself is waiting for a lock which belongs to a thread that is stuck at 
"org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118) which is in the TIMED_WAITING state.
Only by killing the Network Server using "kill" is possible at this point.
There are transactions left even though all clients have been removed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Reply via email to