Bryan Pendleton (JIRA) wrote:
Optimizer performance slowdown from 10.1 to 10.2
------------------------------------------------
Just some quick comments based on my first (rather quick) reading of the
description:
- the optimizer changes in 10.2 seem to have given the optimizer many
more choices of possible query plans to consider. I think this means
that, if the optimizer does not time out, it will spend substantially
more time optimizing because there are more choices to evaluate. Does
this by itself mean that the optimizer will take 2.5 times longer in
10.2 than it did in 10.1?
Not necessarily "will", but "could", yes. Take as an example the following
discussion, copied from the html document attached to DERBY-781:
<begin copy>
select <...> from t1, v1 where v1.xxx = t1.yyy ...
where v1 is a view that consitutes a large, non-flattenable subquery taking 100
units of time per "decorated" permutation to optimize, while t1 takes 1 unit of
time per "decorated" permutation to optimize. Assume there's an index on T1.
Prior to the changes for DERBY-781, the optimization may have gone something
like this:
* Try join order {t1, v1}. We'd have two permutations of T1 (table scan
and index scan) and a single permutation (nested loop join) for v1, giving
optimize time of 2 + 100 = 102.
* Try join order {v1, t1}. We'd have a single permutation of V1 (nested
loop join) and four permutations for T1 (table scan w/ nested loop, table scan
w/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 =
104.
* Total optimize time: 102 + 104 = 206.
With the changes for DERBY-781, though, the optimization would be:
* Try join order {t1, v1}. We'd have two permutations of T1 (table scan
and index scan) and *two* permutations for v1 (nested loop join and hash join),
giving optimize time of 2 + 200 = *202*.
* Try join order {v1, t1}. We'd have a single permutation of V1 (nested
loop join) and four permutations for T1 (table scan w/ nested loop, table scan
w/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 =
104.
* Total optimize time: 202 + 104 = 306.
So the total time it takes to optimize the query in this case jumps from 206
units to 306 units--i.e. almost 50%. If the subquery is fairly simple, this
additional compilation time will be fairly negligible; but if it's a large
and/or complicated subquery, the time to compile can grow significantly.
<end copy>
So in short, one unfortunate side effect of having the optimizer consider hash
joins with subqueries is that we spend more time optimizing. This is typically
worth it because the optimizer can find a better plan--but as you have
discovered, this can also make things worse for queries that have relatively
small execution times.
- something about this query seems to make the costing mechanism go
haywire, and produce extreme costs. While stepping through the
optimization of this query in the debugger I have seen it compute
costs like 1e63 and 1e200. This might be very closely related to
DERBY-1905, although I don't think I'm doing any subqueries here.
But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
Yes, I think this is DERBY-1905. You have views in your query and you are
joining with those views. The views themselves constitute "subqueries" in the
sense that they are queries which need to be compiled and executed as part of
the outer query. Most of the "subquery" discussion that occurs for DERBY-781,
DERBY-805, DERBY-1905, and other related optimizer issues are dealing with
exactly this kind of query--i.e. the views themselves are the "subqueries" in
question. You could just as easily "in-line" the view declaration directly into
the query as a subquery; from an optimizer standpoint, I think the result is the
same...?
At any rate, due to the enormous estimated costs, timeout does not
occur.
Yes, sounds like DERBY-1905.
- the WHERE clause in this query is converted during compilation to
an equivalent IN clause, I believe, which then causes me to run into
a number of the problems described in DERBY-47 and DERBY-713.
Specifically, rather than constructing a plan which involves 4
index probes for the 4 WHERE clause values, the optimizer decides
that an index scan must be performed and that it will have to process
the entire index (because the query uses parameter markers, not
literal values). So perhaps solving DERBY-47 would help me
That might be true. I haven't actually spent any time looking at DERBY-47, but
that certainly seems to be the thorn in a lot of people's sides. That would
definitely constitute a "pain point" for a number of Derby users...
- the optimizer in fact comes up with a "decent" query plan quite quickly.
I have experimented with placing a hard limit into the optimizer
timeout code, so that I can force optimization to stop after an
arbitrary fixed period of time. Then I have been able to set that
value to as low as 1 second, and the optimizer has produced plans
that then execute in a few milliseconds. Of course, I have only tried
this with a trivial amount of data in my database, so it's possible
that the plan produced by the optimizer after just a second of
optimizing is in fact poor, and I'm just not noticing it because my
data sizes are so small.
Does the query plan chosen by the optimizer when it spends 220 seconds (or 15
minutes (!)) execute in a time similar to the other plans that you've seen
(including 10.1), or is it slower?
At this point, what would be really helpful to me would be some suggestions
about some general approaches or techniques to try to start breaking down
and analyzing this problem.
Unfortunately, there are no clean-cut rules to help out with this kind of thing.
You have already determined that the problem is with compilation, not
execution. You have a repro and you verified that the query is much slower with
10.2 than it was 10.1. This is all great information--thank you for your
up-front thoroughness.
At this point my inclination would be to get the query plans for 10.1, trunk
(220 seconds), and trunk (15 minutes). Then, if you have time and resources,
get those query plans again with "derby.optimizer.noTimeout" set to true. With
that information you can then compare the query plans to see if anything
interesting shows up. (This is one reason why it would nice to have improved
query plan logging; the current plan dump has a whole of stuff that is rather
hard to manage).
After that I don't have much wisdom. I myself would start by figuring out what
join order is chosen in 10.1. Then try to figure out how that join order is
treated in 10.2--do we ever get to it, or do we timeout beforehand? If we get
to it, what happens to it? Do we try out the same decorated permuations (index
choices, join strategies) for that permutation? Do we throw it out in favor of
something else?
In terms of the actual code, I added some timeout 'reset' logic for the pushing
of predicates a while ago (I think it was for DERBY-805?). In hindsight I think
that was probably not a good idea, as it could delay timeout for subqueries for
a potentially very long time. This might be a place to start; see
OptimizerImpl.prepForNextRound().
That's all I've got for now. Hopefully there's something helpful in this email;
I'll re-read the description again and see if anything else comes to mind.
Thank you for filing the issue and for your interest in resolving it.
Army
PS. I ran the repro on my laptop with a bunch of apps running in the
background, and it took a full 35 minutes to complete. It *did* complete,
though, which rules out infinite looping. Not exactly thrilling news, I know,
but it's something...