See https://issues.apache.org/jira/browse/DERBY-6510
Basically it looks like for some reason the system gets stuck in trying to create the query plan. Copied from the JIRA: We had an issue today in a production environment at a large customer site. Basically 5 database interactions became stuck and are not progressing. Part of the system dump performs a stack trace every few seconds for a period of a minute on the Glassfish application server and the Derby database engine (running in network server mode). Also, the dump captures the current transactions and the current lock table (ie. syscs_diag.transactions and syscs_diag.lock_table). We had to restart the system and in doing so, the Derby database engine would not shutdown and had to be killed. The stack traces of the Derby engine show 5 threads that are basically making no progress in that at each sample, they are at the same point, waiting. I will attach the stack traces as well as the state of the transactions and locks. Interesting is that the "derby.jdbc.xaTransactionTimeout =1800" is set, yet the transactions did not timeout. The timeout is for 30 minutes but the transactions were in process for hours. t looks like there is an issue where it the optimizer is getting stuck. If you look at DRDAConnThread_42, it looks like it is trying to create an plan whree it is calling getNextDecoratedPermutation. Note that all 5 of the transactions were doing the exact same query to the database. They were started at different times however, many minutes, if not hours apart (ie. 7:59:25, 8.14:45, 8:40:00, 9:09:11, 9:18:20) all today. Note that many hundreds of times during this period, the same exact queries were done successfully with no issue. This same query is processed many thousands of times per day. Another point which cannot be seen by the attached files is that there was another instance of this but it did complete but it took 2 hours and 46 seconds. So it is possible that these might have completed given enough time. We may have seen this before where the customer has complained of processing that has taken hours but until today, I never had a good capture of the state. I am asking for help on where to look in the source code so that I might be able to determine a possible fix and any help or ideas will be greatly appreciated.
