Sorry to be a persistent but the this is a very serious issue that I am seeing 
in production and would just like some guidance on where to look in the source 
for a possible issue.

My assumption is that there is a possibility that the optimization may get into 
a state where it may take a very long time to complete or may even be looping.  
 I say this because of the stack traces that I have, the fact that at least one 
time a long executing instance did complete, and the fact that this same query 
is executed thousand of times a day without issue.

The query does have multiple tables to join (about 8 if I remember correctly 
but can get the details), so there could be many query plans, but there are 
proper indexes on each table.  Also the statistics are up to date and the same 
query was being performed while 5 instances were stuck were being performed 
over and over during the same period.  The data being queried upon is not 
changing that rapidly, mostly increasing in row count, but at a rate of 
hundreds of rows per day with very few rows being removed.

So what I would like to explore is under what conditions could the optimizer 
get into a state where it might loop or make no progress for a very long period 
of time.

Thanks for any insight that you might provide.

On Mar 13, 2014, at 3:32 PM, Bergquist, Brett 
<[email protected]<mailto:[email protected]>> wrote:

See https://issues.apache.org/jira/browse/DERBY-6510

Basically it looks like for some reason the system gets stuck in trying to 
create the query plan.   Copied from the JIRA:

We had an issue today in a production environment at a large customer site. 
Basically 5 database interactions became stuck and are not progressing. Part of 
the system dump performs a stack trace every few seconds for a period of a 
minute on the Glassfish application server and the Derby database engine 
(running in network server mode). Also, the dump captures the current 
transactions and the current lock table (ie. syscs_diag.transactions and 
syscs_diag.lock_table). We had to restart the system and in doing so, the Derby 
database engine would not shutdown and had to be killed.

The stack traces of the Derby engine show 5 threads that are basically making 
no progress in that at each sample, they are at the same point, waiting.

I will attach the stack traces as well as the state of the transactions and 
locks.

Interesting is that the "derby.jdbc.xaTransactionTimeout =1800" is set, yet the 
transactions did not timeout. The timeout is for 30 minutes but the 
transactions were in process for hours.

t looks like there is an issue where it the optimizer is getting stuck. If you 
look at DRDAConnThread_42, it looks like it is trying to create an plan whree 
it is calling getNextDecoratedPermutation.
Note that all 5 of the transactions were doing the exact same query to the 
database. They were started at different times however, many minutes, if not 
hours apart (ie. 7:59:25, 8.14:45, 8:40:00, 9:09:11, 9:18:20) all today.

Note that many hundreds of times during this period, the same exact queries 
were done successfully with no issue.   This same query is processed many 
thousands of times per day.

Another point which cannot be seen by the attached files is that there was 
another instance of this but it did complete but it took 2 hours and 46 
seconds.   So it is possible that these might have completed given enough time. 
  We may have seen this before where the customer has complained of processing 
that has taken hours but until today, I never had a good capture of the state.

I am asking for help on where to look in the source code so that I might be 
able to determine a possible fix and any help or ideas will be greatly 
appreciated.

Reply via email to