[ 
https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177292#comment-13177292
 ] 

Brett Bergquist commented on DERBY-5552:
----------------------------------------

I have found the cause of the problem.  When a lock timeout or deadlock is 
detected, the server calls XATransactionState.cleanupOnError.   This looks like:


        public void cleanupOnError(Throwable t) {

                if (t instanceof StandardException) {

                        StandardException se = (StandardException) t;
            
            if (se.getSeverity() >= ExceptionSeverity.SESSION_SEVERITY) {
                popMe();
                return;
            }

                        if (se.getSeverity() == 
ExceptionSeverity.TRANSACTION_SEVERITY) {

                                synchronized (this) {
                                        // disable use of the connection until 
it is cleaned up.
                                        conn.setApplicationConnection(null);
                                        notifyAll();
                                        associationState = TRO_FAIL;
                                        if 
(SQLState.DEADLOCK.equals(se.getMessageId()))
                                                rollbackOnlyCode = 
XAException.XA_RBDEADLOCK;
                                        else if 
(SQLState.LOCK_TIMEOUT.equals(se.getMessageId()))
                                                rollbackOnlyCode = 
XAException.XA_RBTIMEOUT;                                    
                                        else
                                                rollbackOnlyCode = 
XAException.XA_RBOTHER;
                                }
                        }
                }
        }

The problem is the line of code:

                                        conn.setApplicationConnection(null);

The problem that occurs is on the client side, when the SQLException is 
received, it ends up calling Sqlca.getMessage() to retrieve the formatted 
exception message.  This makes a call back down to the server on the connection 
and ends up calling EmbedStatement.checkStatus() and now the EmbedConnection 
has a null "applicationConnection" and a noCurrentConnection is throw.   DRDA 
code that receives this exception in processing of Sqlca.getMessage() 
determines that there is a protocol error and disconnects from the server.

The XA transaction that was in process never has "end" called on it and the XA 
transaction on the client side is now lost.  Derby now has a XA transaction 
that will never end causing all kinds of havoc such as logging all new 
transactions in case the one lost ever does get rolled back.  The file system 
fill up with transaction logs, restarting the database engine takes days, etc.

I have commented out the above line and now the proper lock error is actually 
reported at the client.  I don't know if there are any ramifications of doing 
so at this point however.

                
> Derby threads hanging when using ClientXADataSource and a deadlock or lock 
> timeout occurs
> -----------------------------------------------------------------------------------------
>
>                 Key: DERBY-5552
>                 URL: https://issues.apache.org/jira/browse/DERBY-5552
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Server
>    Affects Versions: 10.8.1.2
>         Environment: Solaris 10, Glassfish V2.1.1,
>            Reporter: Brett Bergquist
>            Priority: Blocker
>         Attachments: appserverstack.txt, client.tar.Z, derby.log, 
> derbystackatshutdown.txt, execute.patch, transactionsleft.txt
>
>
> The issue arrives when multiple XA transactions are done in parallel and 
> there is either a lock timeout or a lock deadlock detected.  When this 
> happens the connection is leaked in the Glassfish connection pool and the 
> client thread hangs in 
> "org.apache.derby.client.netReply.fill(Reply.java:172)".  
> Shutting down the app server fails because the thread has a lock in 
> "org.apache.derby.client.net.NetConnection40" and another task is calling 
> "org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214)"
>  which is waiting for the lock.
> Killing the appsever using "kill" and then attempting to shutdown Derby 
> network server causes the Network Server to hang.  One of the threads hangs 
> waiting for a lock at 
> "org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525)"
>  and the "main" thread has this locked at 
> "org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242)"
>  and it itself is waiting for a lock which belongs to a thread that is stuck 
> at 
> "org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118)
>  which is in the TIMED_WAITING state.
> Only by killing the Network Server using "kill" is possible at this point.
> There are transactions left even though all clients have been removed.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to