[ 
https://issues.apache.org/jira/browse/DERBY-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121792#comment-16121792
 ] 

Brett Bergquist commented on DERBY-6955:
----------------------------------------

A quick look at the LogToFile.flush() method, it looks like there is two 
possible "wait" conditions.  The first is waiting if the database is "frozen" 
and the second is waiting if the log is being flushed.  There is comments that 
there may be multiple threads trying to flush the log, so that would explain a 
bit why there are multiple threads that have acquired the lock: they have 
acquired the lock and are waiting.

I do not see any thread that indicates that the database is frozen.   However, 
out backup process does freeze the database, take a filesystem snapshot, and 
unfreeze the database.   It is possible that this occurred at this point.

I had questioned a while back that if a connection is lost while the database 
is frozen, the database should be unfrozen automatically.   I have seen the 
database become frozen and no connections can be made to unfreeze the database.



> All database threads stuck and not making any progress
> ------------------------------------------------------
>
>                 Key: DERBY-6955
>                 URL: https://issues.apache.org/jira/browse/DERBY-6955
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.10.2.0
>         Environment: Solaris 10
> Very large database (about 750G) with about 100 tx/second insertion rate
> Network Server 
>            Reporter: Brett Bergquist
>         Attachments: stacktrace1, stacktrace2
>
>
> A system in production got into a very strange state with none of the 
> database engine threads making any progress.   Initially a connection could 
> be made to the network server, but any query would hang.  The network server 
> also hung on shutdown.
> The lack of progresss was observed by generating 6 stack traces, 10 seconds 
> apart.  Each of these stack traces show the threads exactly as in the 
> previous stack trace.  I will attach 2 of the stack traces which are 10 
> seconds apart.
> About half of the threads were in a call chain the looks like:
> {code:java}
> "DRDAConnThread_161" #187 prio=5 os_prio=64 tid=0x0000000101f1c800 nid=0xd6 
> waiting on condition [0xfffffffcad0fe000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0xfffffffefa4524c8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
>       at 
> org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown
>  Source)
>       at 
> org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
>       at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
> Source)
>       at 
> org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown 
> Source)
>       at 
> org.apache.derby.impl.store.access.RAMTransaction.findConglomerate(Unknown 
> Source)
>       at 
> org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown
>  Source)
>       at 
> org.apache.derby.impl.store.access.RAMTransaction.openStoreCost(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.CompilerContextImpl.getStoreCostController(Unknown
>  Source)
>       at 
> org.apache.derby.impl.sql.compile.FromBaseTable.getStoreCostController(Unknown
>  Source)
>       at org.apache.derby.impl.sql.compile.FromBaseTable.estimateCost(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.OptimizerImpl.estimateTotalCost(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.OptimizerImpl.costBasedCostOptimizable(Unknown
>  Source)
>       at 
> org.apache.derby.impl.sql.compile.OptimizerImpl.costOptimizable(Unknown 
> Source)
>       at org.apache.derby.impl.sql.compile.FromBaseTable.optimizeIt(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.ProjectRestrictNode.optimizeIt(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.OptimizerImpl.costPermutation(Unknown 
> Source)
>       at org.apache.derby.impl.sql.compile.SelectNode.optimize(Unknown Source)
>       at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>       at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>       at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.<init>(Unknown 
> Source)
>       at org.apache.derby.jdbc.Driver40.newEmbedPreparedStatement(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown 
> Source)
>       - locked <0xffffffff12e07950> (a 
> org.apache.derby.impl.jdbc.EmbedConnection40)
>       at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown 
> Source)
>       at org.apache.derby.impl.drda.DRDAStatement.prepare(Unknown Source)
>       at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(Unknown 
> Source)
>       at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(Unknown 
> Source)
>       at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown 
> Source)
>       at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> {code}
> and the majority of the others look like:
> {code:java}
> "DRDAConnThread_118" #144 prio=5 os_prio=64 tid=0x0000000101072800 nid=0xab 
> in Object.wait() [0xfffffffd342fe000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at java.lang.Object.wait(Object.java:502)
>       at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>       - locked <0xfffffffd605393c0> (a 
> org.apache.derby.impl.store.raw.log.LogToFile)
>       at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>       at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source)
>       at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown 
> Source)
>       at org.apache.derby.impl.store.raw.xact.Xact.xa_commit(Unknown Source)
>       at org.apache.derby.impl.store.access.RAMTransaction.xa_commit(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(Unknown
>  Source)
>       at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.xaCommit(Unknown
>  Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.xa_commit(Unknown Source)
>       - locked <0xfffffffef72c71d8> (a 
> org.apache.derby.impl.jdbc.EmbedConnection40)
>       at org.apache.derby.jdbc.XATransactionState.xa_commit(Unknown Source)
>       - eliminated <0xfffffffefa7b2958> (a 
> org.apache.derby.jdbc.XATransactionState)
>       at org.apache.derby.jdbc.EmbedXAResource.commit(Unknown Source)
>       - locked <0xfffffffefa7b2958> (a 
> org.apache.derby.jdbc.XATransactionState)
>       - locked <0xffffffff5343b4c0> (a org.apache.derby.jdbc.EmbedXAResource)
>       at 
> org.apache.derby.impl.drda.DRDAXAProtocol.commitXATransaction(Unknown Source)
>       at org.apache.derby.impl.drda.DRDAXAProtocol.commitTransaction(Unknown 
> Source)
>       at org.apache.derby.impl.drda.DRDAXAProtocol.parseSYNCCTL(Unknown 
> Source)
>       at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown 
> Source)
>       at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> {code}
> An interesting thing with the second one is the lock on
> - locked <0xfffffffd605393c0> (a 
> org.apache.derby.impl.store.raw.log.LogToFile)
> Multiple threads claim to have locked this object which I find concerning as 
> one would assume that only one thread could have a lock at a time.   See the 
> attached stack trace and you will see that there are many threads that seem 
> to have locked this object and are in the same state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to