[OPTIMIZER] Optimizer "timeout" for subqueries?

Army Thu, 02 Mar 2006 16:48:16 -0800

While working on changes for DERBY-805 and DERBY-1007, I noticed that the"timeout" mechanism for the Derby optimizer is never reset for subqueries. Asis usually required when talking about the optimizer, let me give an example ofwhat I mean.


If we have a query such as:


select <...> from
  (select t1.i, t2.j from t1, t2 where <...>) X1,
   T3
where <...>

then we would have one "outer" query and one "subquery". The outer query wouldbe "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1,t2".

In this case Derby will create two instances of OptimizerImpl: one for the outerquery (call it OI_OQ) and one for the subquery (call it OI_SQ). EachOptimizerImpl has its own timeout "clock" that it initializes at creationtime--but never resets. If timeout occurs, the OptimizerImpl will stopsearching for "the" best plan and will just take the best plan found so far.

That said, for every permutation of the outer query a call will be made tooptimize the subquery. To simplify things, let's assume there are only twopermutations of the outer query: one with join order {X1, T3} and another withjoin order {T3, X1}.

Now let's say we're looking at the first permutation {X1, T3}. OI_OQ will makea call to optimize the subquery represented by OI_SQ. Let's further say thatthe subquery tries some permutation {T1, T2} and then times out. It thenreturns the plan information for {T1, T2} to the outer query. The outer query,which has *not* yet timed out, then decides to try its second permutation {T3,X1}. So it again makes a call to optimize the subquery. In this case, thesubquery--which has already timed out--will *immediately* return without tryingto optimize anything. The outer query will then make a decision about itssecond permutation based on the un-optimized subquery's plan results.


My question is: is this intentional behavior?

On the one hand, I can sort of see the logic in not trying to optimize thesubquery after the first time, because if we've "timed out" then we just want to"wrap things up" as quickly as possible.

On the other hand, the outer query--which is the query representing thetop-level statement that the user is executing--has *not* yet timed out, so itseems to me like the second call to optimize the subquery should get a "cleanstart" instead of timing out right away.

This hasn't really been an issue to date because the "best plan" chosen by thesubquery is typically independent of the outer query's current permutation--withthe exception of "outerCost", which is passed in from the outer query and isfactored into the subquery's cost estimates. Because of this relativeindependence, the plan chosen by the subquery would rarely (if ever?) changewith different permutations of the outer query, so if the subquery timed outonce there was no point in trying to re-optimize it again later.

With my changes for DERBY-805, though, I'm introducing the notion of pushingpredicates from outer queries down into subqueries--which means that the outerjoin order can have a very significant impact on the plan chosen by thesubquery. But because the timeout mechanism is never reset, we could end upskipping the second optimization phase of the subquery, which means we never geta chance to see how much the outer predicates can help, and thus we could end upskipping over some plans that have the potential to give us significantperformance improvement.

It's hard to come up with a concrete example of this because it depends onoptimizer timeout, which varies with different machines. But in my work withDERBY-805 I've seen cases where the optimizer ignores pushed predicates becauseof subquery timeout and thus ends up choosing a (hugely) sup-optimal plan.

All of that said, in an attempt to resolve this issue I added the following twolines to the "prepForNextRound()" method in OptimizerImpl (that method was onlyrecently added as part of a change for DERBY-805):


+       /* Reset timeout state since we want to timeout on a per-round
+        * basis.  Otherwise if we timed out during a previous round and
+        * then we get here for another round, we'll immediately
+        * "timeout" again before optimizing any of the Optimizables--
+        * which means we could be missing out on some helpful
+        * optimizations that weren't available for the previous
+        * round (esp. use of indexes/qualifiers that are made possible
+        * by predicates pushed down from the outer query).  So by
+        * resetting the timeout state here, we give ourselves a chance
+        * to do some legitimate optimization before (potentially) timing
+        * out again.  This could allow us to find some access plans that
+        * are significantly better than anything we found in previous
+        * rounds.
+        */
+       timeOptimizationStarted = System.currentTimeMillis();
+       timeExceeded = false;

I applied this change locally in combination with the DERBY-1007 patch and randerbyall on Linux Red Hat with IBM 1.4.2 with no failures.

Does anyone out there know if this is a safe change? I would appreciate anyfeedback/advice here. If I don't hear any objections, I'll file a sub-task toDERBY-805 for this issue and post the above patch there.


Many thanks in advance to anyone who might have input/feedback here.
Army

[OPTIMIZER] Optimizer "timeout" for subqueries?

Reply via email to