Background:

*         A large database approximately 750G

*         derby.storage.pageCacheSize=64000

*         Inserts going into the database about 125/second

*         Other database updates and deletes are being performed at a lower 
rate.

A query is run by the customer that does not complete in a reasonable amount of 
time.   Unfortunately the customer's program retries this query up to 40 times 
compounding the problem.

The query retrieves a set of records for a period of time.  The query is 
constructed using the indexes of the table such that the rows should be in 
sorted order and an external sort not required.   Basically the query fires off 
periodically and requests the records for a specific hour of the day.   The 
next time it fires off and retrieves the next hour, etc.

The query has run successfully with other parameters repeatedly through out the 
week with no issues.  Only one set of query parameters caused the issue.   The 
same problem occurred exactly one week later again for the same time period of 
the day.

I was able to retrieve the database locally from the customer and I configured 
my server to time and time zone and ran the query with no issue.   So it does 
not appear to be an optimizer problem with it choosing the wrong path.

There is a script that is run at the customer's site to retrieve some data that 
I use for debugging these types of problems as access to the customer site is 
not possible due to security reasons.   One of the things that is captures is 
stack traces every 5 seconds for 1 minute.

The stack traces show no deadlocks and also seem to show that the query path is 
not doing a large external sort or anything like that.   Many of the threads 
are blocked however and they seem to be blocked at a low level.    For example 
many threads look like this:

"DRDAConnThread_98" #123 prio=5 os_prio=64 tid=0x00000001027b3800 nid=0x94 
waiting for monitor entry [0xfffffffd37afe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at 
org.apache.derby.impl.store.raw.data.FileContainer.pageValid(Unknown Source)
                - waiting to lock <0xfffffffd70d5d598> (a 
org.apache.derby.impl.store.raw.data.AllocationCache)
                at 
org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown Source)
                at 
org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown Source)
                at 
org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown Source)
                at 
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
 Source)
                at 
org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.getRowFromResultSet(Unknown 
Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.getNextRowFromRS(Unknown Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.loadSorter(Unknown Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.openCore(Unknown Source)
                at 
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(Unknown Source)
                at 
org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
                at 
org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
                at 
org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
                - locked <0xfffffffd95a59900> (a 
org.apache.derby.impl.jdbc.EmbedConnection40)
                at 
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
Source)
                at 
org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
                at 
org.apache.derby.iapi.jdbc.BrokeredPreparedStatement.execute(Unknown Source)
                at org.apache.derby.impl.drda.DRDAStatement.execute(Unknown 
Source)
                at 
org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
                at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)

and they are waiting on a thread that looks like this:

"DRDAConnThread_61" #86 prio=5 os_prio=64 tid=0x0000000102d4c000 nid=0x6f 
waiting for monitor entry [0xfffffffd3c7fe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at 
org.apache.derby.impl.store.raw.data.FileContainer.pageValid(Unknown Source)
                - locked <0xfffffffd70d5d598> (a 
org.apache.derby.impl.store.raw.data.AllocationCache)
                at 
org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown Source)
                at 
org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown Source)
                at 
org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown Source)
                at 
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
 Source)
                at 
org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
 Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.getRowFromResultSet(Unknown 
Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.getNextRowFromRS(Unknown Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.loadSorter(Unknown Source)
                at 
org.apache.derby.impl.sql.execute.SortResultSet.openCore(Unknown Source)
                at 
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(Unknown Source)
                at 
org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
                at 
org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
                at 
org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
                - locked <0xfffffffd90645fe8> (a 
org.apache.derby.impl.jdbc.EmbedConnection40)
                at 
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
Source)
                at 
org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
                at 
org.apache.derby.iapi.jdbc.BrokeredPreparedStatement.execute(Unknown Source)
                at org.apache.derby.impl.drda.DRDAStatement.execute(Unknown 
Source)
                at 
org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
                at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)

In another stack trace capture  I see a thread as well:

"derby.rawStoreDaemon" #25 daemon prio=5 os_prio=64 tid=0x0000000100b57800 
nid=0x32 runnable [0xfffffffd551fe000]
   java.lang.Thread.State: RUNNABLE
                at java.io.FileDescriptor.sync(Native Method)
                at org.apache.derby.impl.io.DirRandomAccessFile.sync(Unknown 
Source)
                at 
org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown Source)
                at 
org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown Source)
                - locked <0xffffffff56bf1a18> (a 
org.apache.derby.impl.store.raw.data.RAFContainer4)
                at 
org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(Unknown
 Source)
                at 
org.apache.derby.impl.services.cache.ConcurrentCache.cleanEntry(Unknown Source)
                at 
org.apache.derby.impl.services.cache.BackgroundCleaner.performWork(Unknown 
Source)
                at 
org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown Source)
                at 
org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown Source)
                at 
org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
                at java.lang.Thread.run(Thread.java:745)

That one caught my attention because of the cache that appeared being "cleaned".

I don't see any errors in the derby log nor the application log.   The CPU 
utilization of the system goes way up when this is occurring however (around 
45% utilization by Derby).   So Derby is trying to do something but I cannot 
tell what.   While the utilization is up, other queries but with different time 
periods are completing and the insertions and other database updates are also 
working.

Every one of the 40 threads relating to the queries that the customer retried 
seem to be in this same call stack pattern.   It did appear that after hours of 
execution, some of these threads did progress to completion, however the system 
had to be restarted to completely clear the system.

One week later after the restart and about 100 such queries and the same issue 
occurred.   Again the same query for other time periods worked fine even during 
the issue.

So any ideas on where to look will be much appreciated.

________________________________
Canoga Perkins
20600 Prairie Street
Chatsworth, CA 91311
(818) 718-6300

This e-mail and any attached document(s) is confidential and is intended only 
for the review of the party to whom it is addressed. If you have received this 
transmission in error, please notify the sender immediately and discard the 
original message and any attachment(s).

Reply via email to