Re: [jira] Created: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Army Wed, 29 Nov 2006 16:00:09 -0800

Bryan Pendleton (JIRA) wrote:

Optimizer performance slowdown from 10.1 to 10.2
------------------------------------------------

Just some quick comments based on my first (rather quick) reading of thedescription:

 - the optimizer changes in 10.2 seem to have given the optimizer many
   more choices of possible query plans to consider. I think this means
   that, if the optimizer does not time out, it will spend substantially
   more time optimizing because there are more choices to evaluate. Does
   this by itself mean that the optimizer will take 2.5 times longer in
   10.2 than it did in 10.1?

Not necessarily "will", but "could", yes. Take as an example the followingdiscussion, copied from the html document attached to DERBY-781:


<begin copy>

  select <...> from t1, v1 where v1.xxx = t1.yyy ...

where v1 is a view that consitutes a large, non-flattenable subquery taking 100units of time per "decorated" permutation to optimize, while t1 takes 1 unit oftime per "decorated" permutation to optimize. Assume there's an index on T1.Prior to the changes for DERBY-781, the optimization may have gone somethinglike this:

* Try join order {t1, v1}. We'd have two permutations of T1 (table scanand index scan) and a single permutation (nested loop join) for v1, givingoptimize time of 2 + 100 = 102.* Try join order {v1, t1}. We'd have a single permutation of V1 (nestedloop join) and four permutations for T1 (table scan w/ nested loop, table scanw/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 =104.

    * Total optimize time: 102 + 104 = 206.

With the changes for DERBY-781, though, the optimization would be:

* Try join order {t1, v1}. We'd have two permutations of T1 (table scanand index scan) and *two* permutations for v1 (nested loop join and hash join),giving optimize time of 2 + 200 = *202*.* Try join order {v1, t1}. We'd have a single permutation of V1 (nestedloop join) and four permutations for T1 (table scan w/ nested loop, table scanw/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 =104.

    * Total optimize time: 202 + 104 = 306.

So the total time it takes to optimize the query in this case jumps from 206units to 306 units--i.e. almost 50%. If the subquery is fairly simple, thisadditional compilation time will be fairly negligible; but if it's a largeand/or complicated subquery, the time to compile can grow significantly.


<end copy>

So in short, one unfortunate side effect of having the optimizer consider hashjoins with subqueries is that we spend more time optimizing. This is typicallyworth it because the optimizer can find a better plan--but as you havediscovered, this can also make things worse for queries that have relativelysmall execution times.

 - something about this query seems to make the costing mechanism go
   haywire, and produce extreme costs. While stepping through the
   optimization of this query in the debugger I have seen it compute
   costs like 1e63 and 1e200. This might be very closely related to
   DERBY-1905, although I don't think I'm doing any subqueries here.
   But maybe I'm misunderstanding the term "subquery" in DERBY-1905.

Yes, I think this is DERBY-1905. You have views in your query and you arejoining with those views. The views themselves constitute "subqueries" in thesense that they are queries which need to be compiled and executed as part ofthe outer query. Most of the "subquery" discussion that occurs for DERBY-781,DERBY-805, DERBY-1905, and other related optimizer issues are dealing withexactly this kind of query--i.e. the views themselves are the "subqueries" inquestion. You could just as easily "in-line" the view declaration directly intothe query as a subquery; from an optimizer standpoint, I think the result is thesame...?

   At any rate, due to the enormous estimated costs, timeout does not
   occur.


Yes, sounds like DERBY-1905.

- the WHERE clause in this query is converted during compilation toan equivalent IN clause, I believe, which then causes me to run into

   a number of the problems described in DERBY-47 and DERBY-713.
   Specifically, rather than constructing a plan which involves 4
   index probes for the 4 WHERE clause values, the optimizer decides
   that an index scan must be performed and that it will have to process
   the entire index (because the query uses parameter markers, not
   literal values). So perhaps solving DERBY-47 would help me

That might be true. I haven't actually spent any time looking at DERBY-47, butthat certainly seems to be the thorn in a lot of people's sides. That woulddefinitely constitute a "pain point" for a number of Derby users...

 - the optimizer in fact comes up with a "decent" query plan quite quickly.
   I have experimented with placing a hard limit into the optimizer
   timeout code, so that I can force optimization to stop after an
   arbitrary fixed period of time. Then I have been able to set that
   value to as low as 1 second, and the optimizer has produced plans
   that then execute in a few milliseconds. Of course, I have only tried
   this with a trivial amount of data in my database, so it's possible
   that the plan produced by the optimizer after just a second of
   optimizing is in fact poor, and I'm just not noticing it because my
   data sizes are so small.

Does the query plan chosen by the optimizer when it spends 220 seconds (or 15minutes (!)) execute in a time similar to the other plans that you've seen(including 10.1), or is it slower?

At this point, what would be really helpful to me would be some suggestions
about some general approaches or techniques to try to start breaking down
and analyzing this problem.

Unfortunately, there are no clean-cut rules to help out with this kind of thing.You have already determined that the problem is with compilation, notexecution. You have a repro and you verified that the query is much slower with10.2 than it was 10.1. This is all great information--thank you for yourup-front thoroughness.

At this point my inclination would be to get the query plans for 10.1, trunk(220 seconds), and trunk (15 minutes). Then, if you have time and resources,get those query plans again with "derby.optimizer.noTimeout" set to true. Withthat information you can then compare the query plans to see if anythinginteresting shows up. (This is one reason why it would nice to have improvedquery plan logging; the current plan dump has a whole of stuff that is ratherhard to manage).

After that I don't have much wisdom. I myself would start by figuring out whatjoin order is chosen in 10.1. Then try to figure out how that join order istreated in 10.2--do we ever get to it, or do we timeout beforehand? If we getto it, what happens to it? Do we try out the same decorated permuations (indexchoices, join strategies) for that permutation? Do we throw it out in favor ofsomething else?

In terms of the actual code, I added some timeout 'reset' logic for the pushingof predicates a while ago (I think it was for DERBY-805?). In hindsight I thinkthat was probably not a good idea, as it could delay timeout for subqueries fora potentially very long time. This might be a place to start; seeOptimizerImpl.prepForNextRound().

That's all I've got for now. Hopefully there's something helpful in this email;I'll re-read the description again and see if anything else comes to mind.


Thank you for filing the issue and for your interest in resolving it.

Army

PS. I ran the repro on my laptop with a bunch of apps running in thebackground, and it took a full 35 minutes to complete. It *did* complete,though, which rules out infinite looping. Not exactly thrilling news, I know,but it's something...

Re: [jira] Created: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Reply via email to