Bryan Pendleton (JIRA) wrote:
Optimizer performance slowdown from 10.1 to 10.2
------------------------------------------------

Just some quick comments based on my first (rather quick) reading of the description:

 - the optimizer changes in 10.2 seem to have given the optimizer many
   more choices of possible query plans to consider. I think this means
   that, if the optimizer does not time out, it will spend substantially
   more time optimizing because there are more choices to evaluate. Does
   this by itself mean that the optimizer will take 2.5 times longer in
   10.2 than it did in 10.1?

Not necessarily "will", but "could", yes. Take as an example the following discussion, copied from the html document attached to DERBY-781:

<begin copy>

  select <...> from t1, v1 where v1.xxx = t1.yyy ...

where v1 is a view that consitutes a large, non-flattenable subquery taking 100 units of time per "decorated" permutation to optimize, while t1 takes 1 unit of time per "decorated" permutation to optimize. Assume there's an index on T1. Prior to the changes for DERBY-781, the optimization may have gone something like this:

* Try join order {t1, v1}. We'd have two permutations of T1 (table scan and index scan) and a single permutation (nested loop join) for v1, giving optimize time of 2 + 100 = 102. * Try join order {v1, t1}. We'd have a single permutation of V1 (nested loop join) and four permutations for T1 (table scan w/ nested loop, table scan w/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 = 104.
    * Total optimize time: 102 + 104 = 206.

With the changes for DERBY-781, though, the optimization would be:

* Try join order {t1, v1}. We'd have two permutations of T1 (table scan and index scan) and *two* permutations for v1 (nested loop join and hash join), giving optimize time of 2 + 200 = *202*. * Try join order {v1, t1}. We'd have a single permutation of V1 (nested loop join) and four permutations for T1 (table scan w/ nested loop, table scan w/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 = 104.
    * Total optimize time: 202 + 104 = 306.

So the total time it takes to optimize the query in this case jumps from 206 units to 306 units--i.e. almost 50%. If the subquery is fairly simple, this additional compilation time will be fairly negligible; but if it's a large and/or complicated subquery, the time to compile can grow significantly.

<end copy>

So in short, one unfortunate side effect of having the optimizer consider hash joins with subqueries is that we spend more time optimizing. This is typically worth it because the optimizer can find a better plan--but as you have discovered, this can also make things worse for queries that have relatively small execution times.

 - something about this query seems to make the costing mechanism go
   haywire, and produce extreme costs. While stepping through the
   optimization of this query in the debugger I have seen it compute
   costs like 1e63 and 1e200. This might be very closely related to
   DERBY-1905, although I don't think I'm doing any subqueries here.
   But maybe I'm misunderstanding the term "subquery" in DERBY-1905.

Yes, I think this is DERBY-1905. You have views in your query and you are joining with those views. The views themselves constitute "subqueries" in the sense that they are queries which need to be compiled and executed as part of the outer query. Most of the "subquery" discussion that occurs for DERBY-781, DERBY-805, DERBY-1905, and other related optimizer issues are dealing with exactly this kind of query--i.e. the views themselves are the "subqueries" in question. You could just as easily "in-line" the view declaration directly into the query as a subquery; from an optimizer standpoint, I think the result is the same...?

   At any rate, due to the enormous estimated costs, timeout does not
   occur.

Yes, sounds like DERBY-1905.

- the WHERE clause in this query is converted during compilation to an equivalent IN clause, I believe, which then causes me to run into
   a number of the problems described in DERBY-47 and DERBY-713.
   Specifically, rather than constructing a plan which involves 4
   index probes for the 4 WHERE clause values, the optimizer decides
   that an index scan must be performed and that it will have to process
   the entire index (because the query uses parameter markers, not
   literal values). So perhaps solving DERBY-47 would help me

That might be true. I haven't actually spent any time looking at DERBY-47, but that certainly seems to be the thorn in a lot of people's sides. That would definitely constitute a "pain point" for a number of Derby users...

 - the optimizer in fact comes up with a "decent" query plan quite quickly.
   I have experimented with placing a hard limit into the optimizer
   timeout code, so that I can force optimization to stop after an
   arbitrary fixed period of time. Then I have been able to set that
   value to as low as 1 second, and the optimizer has produced plans
   that then execute in a few milliseconds. Of course, I have only tried
   this with a trivial amount of data in my database, so it's possible
   that the plan produced by the optimizer after just a second of
   optimizing is in fact poor, and I'm just not noticing it because my
   data sizes are so small.

Does the query plan chosen by the optimizer when it spends 220 seconds (or 15 minutes (!)) execute in a time similar to the other plans that you've seen (including 10.1), or is it slower?

At this point, what would be really helpful to me would be some suggestions
about some general approaches or techniques to try to start breaking down
and analyzing this problem.

Unfortunately, there are no clean-cut rules to help out with this kind of thing. You have already determined that the problem is with compilation, not execution. You have a repro and you verified that the query is much slower with 10.2 than it was 10.1. This is all great information--thank you for your up-front thoroughness.

At this point my inclination would be to get the query plans for 10.1, trunk (220 seconds), and trunk (15 minutes). Then, if you have time and resources, get those query plans again with "derby.optimizer.noTimeout" set to true. With that information you can then compare the query plans to see if anything interesting shows up. (This is one reason why it would nice to have improved query plan logging; the current plan dump has a whole of stuff that is rather hard to manage).

After that I don't have much wisdom. I myself would start by figuring out what join order is chosen in 10.1. Then try to figure out how that join order is treated in 10.2--do we ever get to it, or do we timeout beforehand? If we get to it, what happens to it? Do we try out the same decorated permuations (index choices, join strategies) for that permutation? Do we throw it out in favor of something else?

In terms of the actual code, I added some timeout 'reset' logic for the pushing of predicates a while ago (I think it was for DERBY-805?). In hindsight I think that was probably not a good idea, as it could delay timeout for subqueries for a potentially very long time. This might be a place to start; see OptimizerImpl.prepForNextRound().

That's all I've got for now. Hopefully there's something helpful in this email; I'll re-read the description again and see if anything else comes to mind.

Thank you for filing the issue and for your interest in resolving it.

Army

PS. I ran the repro on my laptop with a bunch of apps running in the background, and it took a full 35 minutes to complete. It *did* complete, though, which rules out infinite looping. Not exactly thrilling news, I know, but it's something...

Reply via email to