[
https://issues.apache.org/jira/browse/DERBY-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935181#comment-13935181
]
Mike Matrigali commented on DERBY-6510:
---------------------------------------
I am not sure how without a reproducible case, but it seems important to
determine
if the thread is "hung" not doing anything, or "spinning" doing a lot of work
but not
in a reasonable amount of time. Since the optimizer runs in a tight loop
looking
through plans it is hard to look at snapshots and tell the difference between
hung
and spinning. It looks to me like DRDAConnThread_72 is looping rather than
hung as the stacks to look different at the top in your post, and likely other
threads
are waiting on this statement before executing.
The optimizer is built to look at all permutations of a plan, and this number
of iterations
get multiplied by every index and every table in the query. It is supposed
to stop
once it has spent at least as long looking at plans as it thinks it will take
to execute
the plan - but this logic can easily go wrong with some bad estimates and
outdated
performance assumptions.
Can you identify the full query that the optmizer is working on during the
problem? If so can you force
a optimizer run on a test machine with same ddl and see if that takes a long
time.
Again without a repro hard to do, but it would be interesting to know why this
prepare is happening,
since you mention this statment likely has been executed many times.
i.e.:
1) did not exist in statement cache ever
2) does not exist in statement cache now but did in the past
3) has been invalidated for some reason - what is reason? I think you actually
found/fixed some
problems where the auto statistics was invalidating plans much too often.
Is it likely in production
that there is anything going on that would make statements invalid other
than going out of cache,
or auto stat saying that stats are out of date? For instance is there any
ddl during production, or
compress table?
> Deby engine threads not making progress
> ---------------------------------------
>
> Key: DERBY-6510
> URL: https://issues.apache.org/jira/browse/DERBY-6510
> Project: Derby
> Issue Type: Bug
> Components: Network Server
> Affects Versions: 10.9.1.0
> Environment: Oracle Solaris 10/9, Oracle M5000 32 CPU, 128GB memory,
> 8GB allocated to Derby Network Server
> Reporter: Brett Bergquist
> Priority: Critical
> Attachments: dbstate.log, derbystacktrace.txt
>
>
> We had an issue today in a production environment at a large customer site.
> Basically 5 database interactions became stuck and are not progressing.
> Part of the system dump performs a stack trace every few seconds for a period
> of a minute on the Glassfish application server and the Derby database engine
> (running in network server mode). Also, the dump captures the current
> transactions and the current lock table (ie. syscs_diag.transactions and
> syscs_diag.lock_table). We had to restart the system and in doing so, the
> Derby database engine would not shutdown and had to be killed.
> The stack traces of the Derby engine show 5 threads that are basically making
> no progress in that at each sample, they are at the same point, waiting.
> I will attach the stack traces as well as the state of the transactions and
> locks.
> Interesting is that the "derby.jdbc.xaTransactionTimeout =1800" is set, yet
> the transactions did not timeout. The timeout is for 30 minutes but the
> transactions were in process for hours.
--
This message was sent by Atlassian JIRA
(v6.2#6252)