RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-22 Thread Bergquist, Brett
I opened DERBY-5552

https://issues.apache.org/jira/browse/DERBY-5552

I attached client side traces using traceLevel=2145 
(TRACE_XA_CALLS|TRACE_PROTOCOL_FLOWS|TRACE_CONNECTS|TRACE_CONNECTION_CALL).   I 
don't know if more is needed.

I have sever side traces but it is 93Mb uncompressed and 13Mb compressed.  Is 
there something I should look for in there to narrow down which trace files to 
include and upload?

I have attached jstack traces of the application server and the Derby Network 
Server at shutdown showing the hung threads.

I have attached a the output of select * from syscs_diag.transaction_table 
when there are no clients and no other database action showing transactions 
that are still present.

I am trying to narrow down a test case better but have not been able to at this 
point.This is repeatable with my J2EE application every time however with 
the test setup that I have.   Any further areas to look at with a debugger or 
outputting more tracing information will be greatly appreciated.

From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net]
Sent: Wednesday, December 21, 2011 7:25 PM
To: derby-dev@db.apache.org
Subject: Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

On 12/21/2011 3:14 PM, Bergquist, Brett wrote:
Will get to this tomorrow but I do see one comment in the code that I don't 
understand:

In DRDAConnThread.java, I see:

if (severity  CodePoint.SVRCOD_ERROR)
{
// For a session ending error  
CodePoint.SRVCOD_ERROR you cannot
// send a SQLERRRM. A CMDCHKRM 
is required.  In XA if there is a
// lock timeout it ends the 
whole session. I am not sure this
// is the correct behaviour but 
if it occurs we have to send
// a CMDCHKRM instead of SQLERRM
writeCMDCHKRM(severity);
}

So what does the comment In XA if there is a lock timeout it ends the whole 
session refer to.  Why would a lock timeout be any different than any other 
standard database error.  It is like this is hinting at what is happening.

This is a real XA transaction.

What I see is that after the timeout is hit (I see it hit in Timeout.java) the 
error is propagated to the app server.  The app server then attempts to get the 
error text (I don't have the code handy) which attempts to send a request back 
to the Derby.  This then fails with a No Connection error being returned back 
from Derby.  It is as if after this error, the connection between the app 
server and Derby is no longer once there this is hit.
I agree that would not be the correct behavior if a lock timeout killed  the 
session. As this is a server side comment it would imply that this is a problem 
with embedded as well as well, but hard to believe it would not have been 
exposed before now. Thanks for working on reproduction for this.  I don't see 
the comment in the original code import but the annotation is not clear as it 
mentions the back out of another fix, so I am  not sure who  first noticed this 
behavior.


Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Katherine Marsden

On 12/21/2011 6:14 AM, Bergquist, Brett wrote:


I have a problem with a deadlock with Derby 10.8.1.2 and Glassfish 
V2.1.1 but at this point I cannot tell if it is a Derby problem or a 
Glassfish problem so I am looking for some guidance.


A database pool is created using the 
org.apache.derby.jdbc.ClientXADataSource XADataSource.   Connection 
validation is turned on and uses auto-commit to validate the 
connection and attempts to close all connections on any validation 
failure.


The deadlock is triggered when I have a couple of transactions that 
cause a Derby locks deadlock.  Deadlock detection and tracing is 
turned on in Derby and it correctly detects the locking deadlock and 
terminates one of the transactions.   Soon after this, however,  
Glassfish becomes stuck.  Here is the only reference to Derby in the 
jstack output of Glassfish:


PMCollectionActivationManager.newPmTestResult daemon prio=3 
tid=0x000103926800 nid=0xaf runnable [0xfffc218fa000]


   java.lang.Thread.State: RUNNABLE

at java.net.SocketInputStream.socketRead0(Native Method)

at 
java.net.SocketInputStream.read(SocketInputStream.java:129)


at org.apache.derby.client.net.Reply.fill(Unknown Source)

at 
org.apache.derby.client.net.Reply.ensureALayerDataInBuffer(Unknown Source)


at 
org.apache.derby.client.net.Reply.readDssHeader(Unknown Source)


at 
org.apache.derby.client.net.Reply.startSameIdChainParse(Unknown Source)


at 
org.apache.derby.client.net.NetXAConnectionReply.readLocalXACommit(Unknown 
Source)


at 
org.apache.derby.client.net.NetXAConnection.readLocalXACommit_(Unknown 
Source)


at 
org.apache.derby.client.net.NetXAConnection.readCommit(Unknown Source)


at 
org.apache.derby.client.net.NetConnection.readXACommit_(Unknown Source)


at 
org.apache.derby.client.am.Connection.readCommit(Unknown Source)


at 
org.apache.derby.client.am.Connection.readAutoCommit(Unknown Source)


at 
org.apache.derby.client.am.Connection.flowClose(Unknown Source)


at 
org.apache.derby.client.am.Connection.closeResourcesX(Unknown Source)


at 
org.apache.derby.client.am.Connection.closeResources(Unknown Source)


- locked 0xfffc91831e18 (a 
org.apache.derby.client.net.NetConnection40)


at 
org.apache.derby.client.net.NetConnection.closeResources(Unknown Source)


- locked 0xfffc91831e18 (a 
org.apache.derby.client.net.NetConnection40)


at 
org.apache.derby.client.ClientPooledConnection.close(Unknown Source)


- locked 0xfffc91831cf8 (a 
org.apache.derby.client.ClientXAConnection40)


at 
org.apache.derby.client.ClientXAConnection.close(Unknown Source)


- locked 0xfffc91831cf8 (a 
org.apache.derby.client.ClientXAConnection40)


at 
com.sun.gjc.spi.ManagedConnection.destroy(ManagedConnection.java:276)




First thing I would suggest is move up to 10.8.2.2 just to get running 
on the latest.
I wonder is there indication in the derby.log what went wrong here?  If 
not, I would suggest enabling client and server side DRDA tracing to get 
a clue as to what went wrong with the protocol here.

http://wiki.apache.org/db-derby/ProtocolDebuggingTips


Maybe there is some issue with the logic that dermines if autocommit 
will generate flow.


   // precondition: autoCommit_ is true
void readAutoCommit() throws SqlException {
if (willAutoCommitGenerateFlow()) {
readCommit();
}
}

  public boolean willAutoCommitGenerateFlow() throws 
org.apache.derby.client.am.SqlException {

if (!autoCommit_) {
return false;
}
if (! allowLocalCommitRollback_()) {
return false;
}
return true;
}









RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Bergquist, Brett
I cannot upgrade to 10.8.8.2 at this point as this is in production without 
upgrade possibility.  A also at this point, I don't want to change to many 
variables and have the problem disappear.  Right now I can get it to reproduce 
pretty readily.

I was able to connect with a debug and indeed the thread is blocking on the 
java.net.SocketInputStream.read.  Through the debugger if I force an 
interrupt, I returns from the socket call with an I/O error.  So does this mean 
that the problem is in the derby server side since the socket is  valid and not 
closed before I issue the interrupt?

I am going to enable tracing and see what I find.

From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net]
Sent: Wednesday, December 21, 2011 11:14 AM
To: derby-dev@db.apache.org
Subject: Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

On 12/21/2011 6:14 AM, Bergquist, Brett wrote:
I have a problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1 but 
at this point I cannot tell if it is a Derby problem or a Glassfish problem so 
I am looking for some guidance.

A database pool is created using the org.apache.derby.jdbc.ClientXADataSource 
XADataSource.   Connection validation is turned on and uses auto-commit to 
validate the connection and attempts to close all connections on any validation 
failure.

The deadlock is triggered when I have a couple of transactions that cause a 
Derby locks deadlock.  Deadlock detection and tracing is turned on in Derby and 
it correctly detects the locking deadlock and terminates one of the 
transactions.   Soon after this, however,  Glassfish becomes stuck.  Here is 
the only reference to Derby in the jstack output of Glassfish:

PMCollectionActivationManager.newPmTestResult daemon prio=3 
tid=0x000103926800 nid=0xaf runnable [0xfffc218fa000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at org.apache.derby.client.net.Reply.fill(Unknown Source)
at 
org.apache.derby.client.net.Reply.ensureALayerDataInBuffer(Unknown Source)
at org.apache.derby.client.net.Reply.readDssHeader(Unknown 
Source)
at 
org.apache.derby.client.net.Reply.startSameIdChainParse(Unknown Source)
at 
org.apache.derby.client.net.NetXAConnectionReply.readLocalXACommit(Unknown 
Source)
at 
org.apache.derby.client.net.NetXAConnection.readLocalXACommit_(Unknown Source)
at 
org.apache.derby.client.net.NetXAConnection.readCommit(Unknown Source)
at 
org.apache.derby.client.net.NetConnection.readXACommit_(Unknown Source)
at org.apache.derby.client.am.Connection.readCommit(Unknown 
Source)
at org.apache.derby.client.am.Connection.readAutoCommit(Unknown 
Source)
at org.apache.derby.client.am.Connection.flowClose(Unknown 
Source)
at 
org.apache.derby.client.am.Connection.closeResourcesX(Unknown Source)
at org.apache.derby.client.am.Connection.closeResources(Unknown 
Source)
- locked 0xfffc91831e18 (a 
org.apache.derby.client.net.NetConnection40)
at 
org.apache.derby.client.net.NetConnection.closeResources(Unknown Source)
- locked 0xfffc91831e18 (a 
org.apache.derby.client.net.NetConnection40)
at org.apache.derby.client.ClientPooledConnection.close(Unknown 
Source)
- locked 0xfffc91831cf8 (a 
org.apache.derby.client.ClientXAConnection40)
at org.apache.derby.client.ClientXAConnection.close(Unknown 
Source)
- locked 0xfffc91831cf8 (a 
org.apache.derby.client.ClientXAConnection40)
at 
com.sun.gjc.spi.ManagedConnection.destroy(ManagedConnection.java:276)

First thing I would suggest is move up to 10.8.2.2 just to get running on the 
latest.
I wonder is there indication in the derby.log what went wrong here?  If not, I 
would suggest enabling client and server side DRDA tracing to get a clue as to 
what went wrong with the protocol here.
http://wiki.apache.org/db-derby/ProtocolDebuggingTips


Maybe there is some issue with the logic that dermines if autocommit will 
generate flow.

   // precondition: autoCommit_ is true
void readAutoCommit() throws SqlException {
if (willAutoCommitGenerateFlow()) {
readCommit();
}
}

  public boolean willAutoCommitGenerateFlow() throws 
org.apache.derby.client.am.SqlException {
if (!autoCommit_) {
return false;
}
if (! allowLocalCommitRollback_()) {
return false;
}
return true;
}








RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Bergquist, Brett
I'm having some trouble getting client side tracing to work.  The connections 
are managed by Glassfish connection pool so I don't know where to set the 
traceDirectory and traceLevel properties.   Can these be specified as 
properties like the password, etc.

Just a little more info as well.

Once of the threads got this error:

[#|2011-12-21T10:28:54.224-0800|INFO|sun-appserver2.1|javax.enterprise.system.container.ejb|_ThreadID=20;_ThreadName=httpSSLWorkerThread-8080-0;|
javax.ejb.TransactionRolledbackLocalException: Exception thrown from bean; 
nested exception is: Exception [EclipseLink-4002] (Eclipse Persistence Services 
- 1.1.3.v20091002-r5404): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLTransactionRollbackException: DERBY SQL error: 
SQLCODE: -1, SQLSTATE: 40XL2, SQLERRMC:

There was a long time where a lock could not be obtained and then next thing 
reported by the thread is:

[#|2011-12-21T10:28:54.269-0800|WARNING|sun-appserver2.1|javax.enterprise.system.core.transaction|_ThreadID=20;_ThreadName=httpSSLWorkerThread-8080-0;_RequestID=5e6bd3ac-516f-4d96-8018-baf3ce187f69;|JTS5064:
 Unexpected exception occurred while delisting the resource
org.apache.derby.client.am.XaException: XAER_RMFAIL : No current connection.
at 
org.apache.derby.client.net.NetXAResource.throwXAException(NetXAResource.java:766)
at 
org.apache.derby.client.net.NetXAResource.throwXAException(NetXAResource.java:663)
at 
org.apache.derby.client.net.NetXAResource.connectionClosedFailure(NetXAResource.java:933)
at 
org.apache.derby.client.net.NetXAResource.end(NetXAResource.java:230)
at com.sun.gjc.spi.XAResourceImpl.end(XAResourceImpl.java:100)
at 
com.sun.jts.jta.TransactionState.rollback(TransactionState.java:199)
at 
com.sun.jts.jtsxa.OTSResourceImpl.rollback(OTSResourceImpl.java:311)

And then later:

[#|2011-12-21T10:28:54.380-0800|SEVERE|sun-appserver2.1|javax.enterprise.system.core.transaction|_ThreadID=20;_ThreadName=httpSSLWorkerThread-8080-0;org.omg.CORBA.INTERNAL:
   vmcid: 0x0  minor code: 0 completed: 
Maybe;rollback;_RequestID=5e6bd3ac-516f-4d96-8018-baf3ce187f69;|JTS5031: 
Exception [org.omg.CORBA.INTERNAL:   vmcid: 0x0  minor code: 0 completed: 
Maybe] on Resource [rollback] operation.|#]

And then later:

[#|2011-12-21T10:28:54.382-0800|WARNING|sun-appserver2.1|javax.enterprise.resource.resourceadapter|_ThreadID=20;_ThreadName=httpSSLWorkerThread-8080-0;_RequestID=5e6bd3ac-516f-4d96-8018-baf3ce187f69;|RAR7115:
 Unable to set ClientInfo for connection
java.sql.SQLClientInfoException: No current connection.
at 
org.apache.derby.client.net.NetConnection40.setClientInfo(NetConnection40.java:274)
at 
org.apache.derby.client.am.LogicalConnection40.setClientInfo(LogicalConnection40.java:254)
at 
com.sun.gjc.spi.jdbc40.ConnectionHolder40.setClientInfo(ConnectionHolder40.java:304)
at 
com.sun.gjc.spi.jdbc40.ConnectionHolder40.close(ConnectionHolder40.java:513)
at 
org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.closeDatasourceConnection(DatabaseAccessor.java:450)
at 
org.eclipse.persistence.internal.databaseaccess.DatasourceAccessor.closeConnection(DatasourceAccessor.java:488)

Another thread saw:

[#|2011-12-21T10:28:54.315-0800|INFO|sun-appserver2.1|javax.enterprise.system.container.ejb|_ThreadID=75;_ThreadName=httpSSLWorkerThread-8080-3;|
javax.ejb.TransactionRolledbackLocalException: Exception thrown from bean; 
nested exception is: Exception [EclipseLink-4002] (Eclipse Persistence Services 
- 1.1.3.v20091002-r5404): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLTransactionRollbackException: DERBY SQL error: 
SQLCODE: -1, SQLSTATE: 40XL2, SQLERRMC:
Wed Dec 21 10:28:54 PST 2011

And then saw:

[#|2011-12-21T10:28:54.326-0800|WARNING|sun-appserver2.1|javax.enterprise.system.core.transaction|_ThreadID=75;_ThreadName=httpSSLWorkerThread-8080-3;_RequestID=b2c68a79-1eb1-497d-aab5-f3875f656ed4;|JTS5064:
 Unexpected exception occurred while delisting the resource
java.lang.NullPointerException
at 
org.apache.derby.client.net.Request.sendBytes(Request.java:1289)
at org.apache.derby.client.net.Request.flush(Request.java:1283)
at 
org.apache.derby.client.net.NetAgent.sendRequest(NetAgent.java:393)
at 
org.apache.derby.client.net.NetAgent.flush_(NetAgent.java:273)
at 
org.apache.derby.client.am.Agent.flowOutsideUOW(Agent.java:196)
at 
org.apache.derby.client.net.NetXAResource.end(NetXAResource.java:242)
at com.sun.gjc.spi.XAResourceImpl.end(XAResourceImpl.java:100)

and then saw:


Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Katherine Marsden

On 12/21/2011 8:34 AM, Bergquist, Brett wrote:


I cannot upgrade to 10.8.8.2 at this point as this is in production 
without upgrade possibility.  A also at this point, I don't want to 
change to many variables and have the problem disappear.  Right now I 
can get it to reproduce pretty readily.


I was able to connect with a debug and indeed the thread is blocking 
on the java.net.SocketInputStream.read.  Through the debugger if I 
force an interrupt, I returns from the socket call with an I/O error.  
So does this mean that the problem is in the derby server side since 
the socket is  valid and not closed before I issue the interrupt?


If you are not seeing the expected SYNCCRD response from the server in 
your trace, I would expect it is either a) A problem on the server side 
that it is not being sent or b) a problem on the client side in terms of 
determining wither this should flow.  Was there anything of interest in 
the derby.log?


I would suggest you go ahead and file a Jira issue so you can attach the 
trace or trace excerpts and reproduction when you get one.  Please put a 
pointer to this email thread in there.


Thanks

Kathey




Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Katherine Marsden

On 12/21/2011 11:20 AM, Bergquist, Brett wrote:


I'm having some trouble getting client side tracing to work.  The 
connections are managed by Glassfish connection pool so I don't know 
where to set the traceDirectory and traceLevel properties.   Can these 
be specified as properties like the password, etc.


They  can be set on the connection URL or  with undocumented system 
properties, documented here #:)

http://wiki.apache.org/db-derby/UndocumentedDerbyBehavior

Looking at the info, again I am curious if there are corresponding 
server side traces in the derby.log.
Also it would be interesting to see if there are at this point any XA 
Transactions in need of recovery in the database.

Just  exit your application and connect  with ij and run:

select * from SYSCS_DIAG.TRANSACTION_TABLE ;




RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Bergquist, Brett
Nothing in the Derby log other than it logging a deadlock with the statements 
and a lock timeout with its statements and it indicating that cleanup had 
started and completed.

I will enable tracing with the documented (undocumented system property).  
Thanks for that information!

I will check for the XA transactions the next time I reproduce this.

Maybe you could point me into the correct area to look.  This seems to be 
triggered either through a lock timeout or a deadlock.   The connection that 
this is occurring through is an XA connection.   I see the logging of this in 
the server log but I am trying to find out where that would be logged from.   
It seems after this occurs and because of the way connection pool is being 
validated and recreated on error by Glassfish (configured to do so), it gets 
into this state.  What I don't understand is why this type of error would cause 
the connection to appear to be invalid and I am trying to work through both the 
Glassfish source and the Derby source to find out.   The connection is 
correctly handling other errors such as a duplication trying to be inserted and 
this does not trigger the connection to appear to be invalid.So I am trying 
to understand why a lock timeout or deadlock detection might do so.

This problem has only cropped up recently when they started performing multiple 
requests that I know have a deadlock path through them.  I can fix that problem 
later but this is a system level problem that I need to resolve.

I really do appreciate the help and guidance and am willing to try to work 
though this.   I have to figure this out and either patch Glassfish or Derby in 
any case as my customer (think very very large wireless carrier) is getting 
pretty PO'ed.

From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net]
Sent: Wednesday, December 21, 2011 2:46 PM
To: derby-dev@db.apache.org
Subject: Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

On 12/21/2011 11:20 AM, Bergquist, Brett wrote:
I'm having some trouble getting client side tracing to work.  The connections 
are managed by Glassfish connection pool so I don't know where to set the 
traceDirectory and traceLevel properties.   Can these be specified as 
properties like the password, etc.
They  can be set on the connection URL or  with undocumented system properties, 
documented here #:)
http://wiki.apache.org/db-derby/UndocumentedDerbyBehavior

Looking at the info, again I am curious if there are corresponding server side 
traces in the derby.log.
Also it would be interesting to see if there are at this point any XA 
Transactions in need of recovery in the database.
Just  exit your application and connect  with ij and run:

select * from SYSCS_DIAG.TRANSACTION_TABLE ;



RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Bergquist, Brett
After shutting down the application server but leaving the database engine 
running, it still has transactions:

6789036870
(4871251,2b0425fa2e6273756e332d7369743233372c7365727665722c5033373030,73756e332d7369743233372c7365727665722c50333730302c00)
CSEMUserTransaction   ACTIVE (1735812,468154)   
  null
6789064593
(4871251,151225fa2e6273756e332d7369743233372c7365727665722c5033373030,73756e332d7369743233372c7365727665722c50333730302c00)
CSEMUserTransaction   ACTIVE null   select 
max(csid) from ( select min(cs.id) as csid, min(cs.configuration_number) as 
csnum, cbe.id as cbid from --DERBY-PROPERTIES joinOrder=FIXED/n 
core_v1.configurable_hardware ch join core_v1.configuration_set cs on 
cs.configurable_hardware_id = ch.id join 
core_v1.configurationset_configurationbundle cscb on cscb.configurationset_id = 
cs.id join core_v1.configuration_bundle cb on cb.id = 
cscb.configurationbundle_id join pkg_9145e10g.configuration_bundle_9145e10g cbe 
on cbe.id = cb.id where ch.id = ? and cb.bundle_name = ? group by cbe.id ) as 
lcs
6789041545
(4871251,230625fa2e6273756e332d7369743233372c7365727665722c5033373030,73756e332d7369743233372c7365727665722c50333730302c00)
CSEMUserTransaction   ACTIVE null   select 
max(csid) from ( select min(cs.id) as csid, min(cs.configuration_number) as 
csnum, cbe.id as cbid from --DERBY-PROPERTIES joinOrder=FIXED/n 
core_v1.configurable_hardware ch join core_v1.configuration_set cs on 
cs.configurable_hardware_id = ch.id join 
core_v1.configurationset_configurationbundle cscb on cscb.configurationset_id = 
cs.id join core_v1.configuration_bundle cb on cb.id = 
cscb.configurationbundle_id join pkg_9145e10g.configuration_bundle_9145e10g cbe 
on cbe.id = cb.id where ch.id = ? and cb.bundle_name = ? group by cbe.id ) as 
lcs
6789042080 null   CSEMUserTransaction   ACTIVE null 
  SELECT t0.ID, t0.DTYPE, t0.BUNDLE_NAME, t0.OPLOCK, t1.ID, t2.ID, 
t2.PM_END_DELAY, t2.PM_BETWEEN_TIME, t2.PM_DIS_OVER_TIME, t2.PM_DIS_END_DELAY, 
t2.PM_SCHEDULER_POLICY, t2.PM_SCHEDULER_STATE FROM 
CORE_V1.CONFIGURATIONSET_CONFIGURATIONBUNDLE t4, CORE_V1.CONFIGURATION_SET t3, 
PKG_9145E10G.PM_SCHEDULER_CONFIG_BUNDLE t2, 
PKG_9145E10G.CONFIGURATION_BUNDLE_9145E10G t1, CORE_V1.CONFIGURATION_BUNDLE t0 
WHERE t3.ID = CAST (? AS INTEGER )) AND (t0.BUNDLE_NAME = CAST (? AS 
VARCHAR(32672) ))) AND (((t2.ID = t0.ID) AND (t1.ID = t0.ID)) AND (t0.DTYPE = 
'PM_SCHEDULER_CONFIG_BUNDLE_9145E10G'))) AND ((t4.CONFIGURATIONBUNDLE_ID = 
t0.ID) AND (t3.ID = t4.CONFIGURATIONSET_ID)))




From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net]
Sent: Wednesday, December 21, 2011 2:46 PM
To: derby-dev@db.apache.org
Subject: Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

On 12/21/2011 11:20 AM, Bergquist, Brett wrote:
I'm having some trouble getting client side tracing to work.  The connections 
are managed by Glassfish connection pool so I don't know where to set the 
traceDirectory and traceLevel properties.   Can these be specified as 
properties like the password, etc.
They  can be set on the connection URL or  with undocumented system properties, 
documented here #:)
http://wiki.apache.org/db-derby/UndocumentedDerbyBehavior

Looking at the info, again I am curious if there are corresponding server side 
traces in the derby.log.
Also it would be interesting to see if there are at this point any XA 
Transactions in need of recovery in the database.
Just  exit your application and connect  with ij and run:

select * from SYSCS_DIAG.TRANSACTION_TABLE ;



RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Bergquist, Brett
This also triggers another problem that I am having in production.  If these 
transactions are left in this state, Derby will build thousands of transaction 
files in the log database.  If the server is not stopped and restarted soon 
after these start occurring, then it may take days for Derby to boot the 
database.

I assume that these are XA transactions that did not complete.  I have 
Glassfish to timeout transactions after 15 minutes but this appears not to be 
working.  I would like to set the derby.jdbc.xaTransactionTimeout property 
but I am leery because of

https://issues.apache.org/jira/browse/DERBY-4109



From: Bergquist, Brett [mailto:bbergqu...@canoga.com]
Sent: Wednesday, December 21, 2011 3:33 PM
To: derby-dev@db.apache.org
Subject: RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

After shutting down the application server but leaving the database engine 
running, it still has transactions:

6789036870
(4871251,2b0425fa2e6273756e332d7369743233372c7365727665722c5033373030,73756e332d7369743233372c7365727665722c50333730302c00)
CSEMUserTransaction   ACTIVE (1735812,468154)   
  null
6789064593
(4871251,151225fa2e6273756e332d7369743233372c7365727665722c5033373030,73756e332d7369743233372c7365727665722c50333730302c00)
CSEMUserTransaction   ACTIVE null   select 
max(csid) from ( select min(cs.id) as csid, min(cs.configuration_number) as 
csnum, cbe.id as cbid from --DERBY-PROPERTIES joinOrder=FIXED/n 
core_v1.configurable_hardware ch join core_v1.configuration_set cs on 
cs.configurable_hardware_id = ch.id join 
core_v1.configurationset_configurationbundle cscb on cscb.configurationset_id = 
cs.id join core_v1.configuration_bundle cb on cb.id = 
cscb.configurationbundle_id join pkg_9145e10g.configuration_bundle_9145e10g cbe 
on cbe.id = cb.id where ch.id = ? and cb.bundle_name = ? group by cbe.id ) as 
lcs
6789041545
(4871251,230625fa2e6273756e332d7369743233372c7365727665722c5033373030,73756e332d7369743233372c7365727665722c50333730302c00)
CSEMUserTransaction   ACTIVE null   select 
max(csid) from ( select min(cs.id) as csid, min(cs.configuration_number) as 
csnum, cbe.id as cbid from --DERBY-PROPERTIES joinOrder=FIXED/n 
core_v1.configurable_hardware ch join core_v1.configuration_set cs on 
cs.configurable_hardware_id = ch.id join 
core_v1.configurationset_configurationbundle cscb on cscb.configurationset_id = 
cs.id join core_v1.configuration_bundle cb on cb.id = 
cscb.configurationbundle_id join pkg_9145e10g.configuration_bundle_9145e10g cbe 
on cbe.id = cb.id where ch.id = ? and cb.bundle_name = ? group by cbe.id ) as 
lcs
6789042080 null   CSEMUserTransaction   ACTIVE null 
  SELECT t0.ID, t0.DTYPE, t0.BUNDLE_NAME, t0.OPLOCK, t1.ID, t2.ID, 
t2.PM_END_DELAY, t2.PM_BETWEEN_TIME, t2.PM_DIS_OVER_TIME, t2.PM_DIS_END_DELAY, 
t2.PM_SCHEDULER_POLICY, t2.PM_SCHEDULER_STATE FROM 
CORE_V1.CONFIGURATIONSET_CONFIGURATIONBUNDLE t4, CORE_V1.CONFIGURATION_SET t3, 
PKG_9145E10G.PM_SCHEDULER_CONFIG_BUNDLE t2, 
PKG_9145E10G.CONFIGURATION_BUNDLE_9145E10G t1, CORE_V1.CONFIGURATION_BUNDLE t0 
WHERE t3.ID = CAST (? AS INTEGER )) AND (t0.BUNDLE_NAME = CAST (? AS 
VARCHAR(32672) ))) AND (((t2.ID = t0.ID) AND (t1.ID = t0.ID)) AND (t0.DTYPE = 
'PM_SCHEDULER_CONFIG_BUNDLE_9145E10G'))) AND ((t4.CONFIGURATIONBUNDLE_ID = 
t0.ID) AND (t3.ID = t4.CONFIGURATIONSET_ID)))




From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net]
Sent: Wednesday, December 21, 2011 2:46 PM
To: derby-dev@db.apache.org
Subject: Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

On 12/21/2011 11:20 AM, Bergquist, Brett wrote:
I'm having some trouble getting client side tracing to work.  The connections 
are managed by Glassfish connection pool so I don't know where to set the 
traceDirectory and traceLevel properties.   Can these be specified as 
properties like the password, etc.
They  can be set on the connection URL or  with undocumented system properties, 
documented here #:)
http://wiki.apache.org/db-derby/UndocumentedDerbyBehavior

Looking at the info, again I am curious if there are corresponding server side 
traces in the derby.log.
Also it would be interesting to see if there are at this point any XA 
Transactions in need of recovery in the database.
Just  exit your application and connect  with ij and run:

select * from SYSCS_DIAG.TRANSACTION_TABLE ;


Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Katherine Marsden

On 12/21/2011 12:04 PM, Bergquist, Brett wrote:


Nothing in the Derby log other than it logging a deadlock with the 
statements and a lock timeout with its statements and it indicating 
that cleanup had started and completed.


I will enable tracing with the documented (undocumented system 
property).  Thanks for that information!


I will check for the XA transactions the next time I reproduce this.

Maybe you could point me into the correct area to look.  This seems to 
be triggered either through a lock timeout or a deadlock.   The 
connection that this is occurring through is an XA connection.   I see 
the logging of this in the server log but I am trying to find out 
where that would be logged from.   It seems after this occurs and 
because of the way connection pool is being validated and recreated on 
error by Glassfish (configured to do so), it gets into this state.  
What I don't understand is why this type of error would cause the 
connection to appear to be invalid and I am trying to work through 
both the Glassfish source and the Derby source to find out.   The 
connection is correctly handling other errors such as a duplication 
trying to be inserted and this does not trigger the connection to 
appear to be invalid.So I am trying to understand why a lock 
timeout or deadlock detection might do so.


This problem has only cropped up recently when they started performing 
multiple requests that I know have a deadlock path through them.  I 
can fix that problem later but this is a system level problem that I 
need to resolve.


I really do appreciate the help and guidance and am willing to try to 
work though this.   I have to figure this out and either patch 
Glassfish or Derby in any case as my customer (think very very large 
wireless carrier) is getting pretty PO'ed.


The one thing I think of specifically with a deadlock is that it will 
automatically rollback the victim transaction and that might throw off 
this client logic regarding the state of the server.But I would 
think if there were just a simple problem with deadlocks it would have 
showed up before now. That said I don't see any specific tests in our XA 
tests: org.apache.derbyTesting.functionTests.tests.jdbapi.XATest or 
org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest  
that test XAConnections with deadlocks.


  Is this a local transaction on an XA connection or a real XA 
transaction with two phase commit?


You might want to try to test and an XAConnection with  a simple 
deadlock case locally to see if that pops a reproduction.   
org.apache.derbyTesting.functionTests.tests.lang.DeadlockDetectionTest 
and org.apache.derbyTesting.functionTests.tests.lang have  some examples 
of deadlocks.


HTH,

Kathey





Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Katherine Marsden

On 12/21/2011 12:40 PM, Bergquist, Brett wrote:


This also triggers another problem that I am having in production.  If 
these transactions are left in this state, Derby will build thousands 
of transaction files in the log database.  If the server is not 
stopped and restarted soon after these start occurring, then it may 
take days for Derby to boot the database.


Uncompleted XA transactions will persist even after the the server is 
stopped and have to be recovered XAResource.recover()  and told 
explicitly whether to commit or rollback.  An example of using 
XAResource.recover() is in XATest  testInterleavingTransactions().  Once 
the core problem is determined, you should run an xa  recovery 
operation  to clean up the residual transactions.


I assume that these are XA transactions that did not complete.  I have 
Glassfish to timeout transactions after 15 minutes but this appears 
not to be working.  I would like to set the 
derby.jdbc.xaTransactionTimeout property but I am leery because of


https://issues.apache.org/jira/browse/DERBY-4109

I can't really say to much about DERBY-4109 as we don't have a 
reproduction and I don't understand the circumstances under which it 
occurs, but in your case, I tend to think extending  timeout is  not 
going to help as it seems  you have hit a protocol exchange where the 
server will never respond to the client.


Looking forward to seeing the excerpts from the client and server traces 
after the deadlock and hearing how your simple deadlock test goes.  If 
you need a simple standalone template for XA to start from, I think 
ReproDerby1016.java attached to DERBY-1016 might work.


Thanks

Kathey


Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Katherine Marsden

On 12/21/2011 1:31 PM, Katherine Marsden wrote:

One more random bit of information for your test case.  I recall when I 
needed to add an XA Specific test case adding the code below to 
SavepointJdbc30Test to get the existing test to run as well with XA. You 
might try doing that with the existing deadlock test cases to pop the 
issue.  I don't think though that it tests two phase commit.




// Repeat the embedded tests obtaining a connection from
// an XA data source if it is supported. This is not supported
// under JSR169.
if (JDBC.vmSupportsJDBC3()) {
embedded = new TestSuite(
SavepointJdbc30_JSR169Test:embedded XADataSource);
embedded.addTestSuite(SavepointJdbc30Test.class);
embedded.addTest(getEmbeddedSuite(SavepointJdbc30_JSR169Test:
+ embedded only XADataSource));

suite.addTest(TestConfiguration.connectionXADecorator(embedded));
// Repeat the client tests obtaining a 
connection from

// an XA data source if it is supported. This is not supported
// under JSR169.
client = new TestSuite(SavepointJdbc30_JSR169Test:client 
XADatasource);

client.addTestSuite(SavepointJdbc30Test.class);

suite.addTest(TestConfiguration.clientServerDecorator(TestConfiguration.connectionXADecorator(client))); 


}

It would be neat to have modes  in which all the tests could be run 
using different types of connections, etc.  
https://issues.apache.org/jira/browse/DERBY-1134


Kathey


RE: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Bergquist, Brett
Will get to this tomorrow but I do see one comment in the code that I don't 
understand:

In DRDAConnThread.java, I see:

if (severity  CodePoint.SVRCOD_ERROR)
{
// For a session ending error  
CodePoint.SRVCOD_ERROR you cannot
// send a SQLERRRM. A CMDCHKRM 
is required.  In XA if there is a
// lock timeout it ends the 
whole session. I am not sure this
// is the correct behaviour but 
if it occurs we have to send
// a CMDCHKRM instead of SQLERRM
writeCMDCHKRM(severity);
}

So what does the comment In XA if there is a lock timeout it ends the whole 
session refer to.  Why would a lock timeout be any different than any other 
standard database error.  It is like this is hinting at what is happening.

This is a real XA transaction.

What I see is that after the timeout is hit (I see it hit in Timeout.java) the 
error is propagated to the app server.  The app server then attempts to get the 
error text (I don't have the code handy) which attempts to send a request back 
to the Derby.  This then fails with a No Connection error being returned back 
from Derby.  It is as if after this error, the connection between the app 
server and Derby is no longer once there this is hit.

I am going to go try to follow through the code and get a smaller reproducible 
example.

From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net]
Sent: Wednesday, December 21, 2011 3:41 PM
To: derby-dev@db.apache.org
Subject: Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

On 12/21/2011 12:04 PM, Bergquist, Brett wrote:
Nothing in the Derby log other than it logging a deadlock with the statements 
and a lock timeout with its statements and it indicating that cleanup had 
started and completed.

I will enable tracing with the documented (undocumented system property).  
Thanks for that information!

I will check for the XA transactions the next time I reproduce this.

Maybe you could point me into the correct area to look.  This seems to be 
triggered either through a lock timeout or a deadlock.   The connection that 
this is occurring through is an XA connection.   I see the logging of this in 
the server log but I am trying to find out where that would be logged from.   
It seems after this occurs and because of the way connection pool is being 
validated and recreated on error by Glassfish (configured to do so), it gets 
into this state.  What I don't understand is why this type of error would cause 
the connection to appear to be invalid and I am trying to work through both the 
Glassfish source and the Derby source to find out.   The connection is 
correctly handling other errors such as a duplication trying to be inserted and 
this does not trigger the connection to appear to be invalid.So I am trying 
to understand why a lock timeout or deadlock detection might do so.

This problem has only cropped up recently when they started performing multiple 
requests that I know have a deadlock path through them.  I can fix that problem 
later but this is a system level problem that I need to resolve.

I really do appreciate the help and guidance and am willing to try to work 
though this.   I have to figure this out and either patch Glassfish or Derby in 
any case as my customer (think very very large wireless carrier) is getting 
pretty PO'ed.
The one thing I think of specifically with a deadlock is that it will 
automatically rollback the victim transaction and that might throw off this 
client logic regarding the state of the server.But I would think if there 
were just a simple problem with deadlocks it would have showed up before now. 
That said I don't see any specific tests in our XA tests: 
org.apache.derbyTesting.functionTests.tests.jdbapi.XATest or 
org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest  that 
test XAConnections with deadlocks.

  Is this a local transaction on an XA connection or a real XA transaction with 
two phase commit?

You might want to try to test and an XAConnection with  a simple deadlock case 
locally to see if that pops a reproduction.   
org.apache.derbyTesting.functionTests.tests.lang.DeadlockDetectionTest and 
org.apache.derbyTesting.functionTests.tests.lang have  some examples of 
deadlocks.

HTH,

Kathey




Re: Problem with a deadlock with Derby 10.8.1.2 and Glassfish V2.1.1

2011-12-21 Thread Katherine Marsden

On 12/21/2011 3:14 PM, Bergquist, Brett wrote:


Will get to this tomorrow but I do see one comment in the code that I 
don't understand:


In DRDAConnThread.java, I see:

if (severity  CodePoint.SVRCOD_ERROR)

{

// For a session 
ending error  CodePoint.SRVCOD_ERROR you cannot


// send a SQLERRRM. A 
CMDCHKRM is required.  In XA if there is a


// lock timeout it 
ends the whole session. I am not sure this


// is the correct 
behaviour but if it occurs we have to send


// a CMDCHKRM instead 
of SQLERRM


writeCMDCHKRM(severity);

}

So what does the comment In XA if there is a lock timeout it ends the 
whole session refer to.  Why would a lock timeout be any different 
than any other standard database error.  It is like this is hinting at 
what is happening.


This is a real XA transaction.

What I see is that after the timeout is hit (I see it hit in 
Timeout.java) the error is propagated to the app server.  The app 
server then attempts to get the error text (I don't have the code 
handy) which attempts to send a request back to the Derby.  This then 
fails with a No Connection error being returned back from Derby.  It 
is as if after this error, the connection between the app server and 
Derby is no longer once there this is hit.


I agree that would not be the correct behavior if a lock timeout killed  
the session. As this is a server side comment it would imply that this 
is a problem with embedded as well as well, but hard to believe it would 
not have been exposed before now. Thanks for working on reproduction for 
this.  I don't see the comment in the original code import but the 
annotation is not clear as it mentions the back out of another fix, so I 
am  not sure who  first noticed this behavior.