[ 
https://issues.apache.org/jira/browse/DERBY-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935181#comment-13935181
 ] 

Mike Matrigali commented on DERBY-6510:
---------------------------------------

I am not sure how without a reproducible case, but it seems important to 
determine
if the thread is "hung" not doing anything, or "spinning" doing a lot of work 
but not
in a reasonable amount of time.   Since the optimizer runs in a tight loop 
looking
through plans it is hard to look at snapshots and tell the difference between 
hung
and spinning.  It looks to me like DRDAConnThread_72 is looping rather than 
hung as the stacks to look different at the top in your post, and likely other 
threads
are waiting on this statement before executing.

The optimizer is built to look at all permutations of a plan, and this number 
of iterations
get multiplied by every index and every table in the query.    It is supposed 
to stop
once it has spent at least as long looking at plans as it thinks it will take 
to execute
the plan - but this logic can easily go wrong with some bad estimates and 
outdated
performance assumptions.

Can you identify the full query that the optmizer is working on during the 
problem?  If so can you force
a optimizer run on a test machine with same ddl and see if that takes a long 
time. 

Again without a repro hard to do, but it would be interesting to know why this 
prepare is happening,
since you mention this statment likely has been executed many times.  
i.e.:
1) did not exist in statement cache ever
2) does not exist in statement cache now but did in the past
3) has been invalidated for some reason - what is reason?  I think you actually 
found/fixed some 
    problems where the auto statistics was invalidating plans much too often.  
Is it likely in production
    that there is anything going on that would make statements invalid other 
than going out of cache,
    or auto stat saying that stats are out of date?  For instance is there any 
ddl during production, or
    compress table?

> Deby engine threads not making progress
> ---------------------------------------
>
>                 Key: DERBY-6510
>                 URL: https://issues.apache.org/jira/browse/DERBY-6510
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Server
>    Affects Versions: 10.9.1.0
>         Environment: Oracle Solaris 10/9, Oracle M5000 32 CPU, 128GB memory, 
> 8GB allocated to Derby Network Server
>            Reporter: Brett Bergquist
>            Priority: Critical
>         Attachments: dbstate.log, derbystacktrace.txt
>
>
> We had an issue today in a production environment at a large customer site.   
> Basically 5 database interactions became stuck and are not progressing.   
> Part of the system dump performs a stack trace every few seconds for a period 
> of a minute on the Glassfish application server and the Derby database engine 
> (running in network server mode).   Also, the dump captures the current 
> transactions and the current lock table (ie. syscs_diag.transactions and 
> syscs_diag.lock_table).   We had to restart the system and in doing so, the 
> Derby database engine would not shutdown and had to be killed.
> The stack traces of the Derby engine show 5 threads that are basically making 
> no progress in that at each sample, they are at the same point, waiting.
> I will attach the stack traces as well as the state of the transactions and 
> locks.   
> Interesting is that the "derby.jdbc.xaTransactionTimeout =1800" is set, yet 
> the transactions did not timeout.  The timeout is for 30 minutes but the 
> transactions were in process for hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to