Re: [OPTIMIZER] OptimizerImpl "best plans" for subqueries?

Army Sat, 18 Feb 2006 10:11:05 -0800

Jeffrey Lichtman wrote:

I think it may be necessary when remembering "truly the best" accesspath at any level of subquery to tell each table subquery to rememberits best access path as "truly the best" also, and to recurse throughall the nested subqueries to do this. I don't think this would be veryhard to implement.

I was actually toying around with something like this yesterday, so it's nice tohear you suggest a similar approach :) I wrote up a solution that tries toimplement it; I ran derbyall last night with the changes and there were nofailures. I also ran my predicate pushing code locally with these changesapplied (in place of the Phase 1 patch) and everything worked there, too. So Ithink this is the right way to go--and it feels more robust to me than the firstPhase 1 patch for DERBY-805. So thanks for the ideas and reassurance...

I only worry about what would happen with deeply-nested subqueries - thiscould create an order 2**n algorithm (that is, one whose time would growexponentially with the number of levels of nesting).

I guess my answer to that is if we're dealing with deeply nested queries, thecaller will probably be preparing the query once and then executing it multipletimes, in which case the extra optimization cost would be an up-front, one timething while the benefits from doing the extra work could lead to hugeperformance savings for every execution of the query thereafter (when combinedwith predicate pushdown, that is). Depending on just how much overhead we'readding to the optimization phase, this seems like an acceptable trade-off to me...

The solution I've coded tries to do what you describe. I'll outline it here andpost the patch as a new "Phase 1" patch for DERBY-805 since as I said earlier, Ithink this issue is really the issue that Phase 1 is trying to address.

First, I added a new field called "optimizerToBestPlanMap" in FromTable, whichis the parent class to all Optimizables. This field (which is a HashMap) holds"truly the best access path" (TTBP) for the Optimizable with respect to everyOptimizerImpl at or above it, where "at" refers to the OptimizerImpl to whichthe Optimizable directly belongs.

I then added a new method called "addOrLoadBestPlanMapping" to the Optimizableinterface, implemented in FromTable. The signature for this method is:


+       public void addOrLoadBestPlanMapping(boolean doAdd,
+               Optimizer optimizer) throws StandardException
+       {

If "doAdd" is true then we will take the Optimizable's trulyTheBestAccessPathfield and put a copy of it into optimizerToBestPlanMap, with the key being thereceived optimizer. If "doAdd" is false, then we will search for the hash mapfor a TTBP that corresponds to the received optimizer, and if we find one wewill load that path information into the Optimizable's trulyTheBestAccessPath field.

This new method is over-written by the two Optimizable classes that can havechildren: namely, SingleChildResultSetNode and TableOperatorNode. In each casethe method will first call super.addOrLoadBestPlanMapping(), which adds/loadsTTBP for the node itself, and then it will recursively call that method on thechild result in two cases: 1) if the child is another Optimizable, then the callis made directly on that child; 2) if the child is a SelectNode, then we call anew method on SelectNode that will iterate through all Optimizables in its FROMlist and recursively call addOrLoadBestPlanMapping(). By doing so we "recursethrough all nested subqueries", as Jeff said. For example, inSingleChildResultSetNode we have the following:


+       /** @see Optimizable#addOrLoadBestPlanMapping */
+       public void addOrLoadBestPlanMapping(boolean doAdd,
+               Optimizer optimizer) throws StandardException
+       {
+               super.addOrLoadBestPlanMapping(doAdd, optimizer);
+               if (childResult instanceof Optimizable)
+               {
+                       ((Optimizable)childResult).
+                               addOrLoadBestPlanMapping(doAdd, optimizer);
+               }
+               else if (childResult instanceof SelectNode)
+               {
+                       ((SelectNode)childResult).
+                               addOrLoadBestPlanMappings(doAdd, optimizer);
+               }
+       }

The third thing I've done is change the "rememberAsBest()" method on theOptimizable interface to take an instance of Optimizer as an argument. Thenwhenever an OptimizerImpl decides it has found a best overall path and callsrememberAsBest(), we make an additional call (within rememberAsBest()) to thenew addOrLoadBestPlanMapping() method with "doAdd" set to true and passing inthe optimizer. This means that whenever an OptimizerImpl finds a best plan,every Optimizable in its list (including all Optimizables found recursivelythrough subqueries) will remember what its TTBP was with respect to thatOptimizerImpl.

And as a last step, I've added two calls to addOrLoadBestPlanMapping() to theOptimizerImpl class itself. The first is made with "doAdd" set to true and iscalled whenever we place an Optimizable at a join position, just before we call"startOptimizing()". We need this call in order to remember what TTBP for eachOptimizable (including those within subqueries) is _before_ we begin optimizing,so that if the join order for the current OptimizerImpl ends up being worse thana previous join order, we can "revert" the Optimizable (and recursively anyOptimizables from nested subqueries) back to its TTBP for the OptimizerImpl'spreviously-determined best join order.

The second call to addOrLoadBestPlanMapping() comes when we pull an Optimizablefrom its current join position, in which case doAdd is "false", which we meanswe take the Optimizable's TTBP corresponding to the current OptimizerImpl andload it into the Optimizable's trulyTheBestAccessPath field. If theOptimizerImpl's most recent join order was the best one so far, TTBP will end upholding the plan that was saved for that join order; otherwise, TTBP will end upholding the "reverted" plan, which is the plan the Optimizable had for whateverjoin order was previously considered best.

Okay, so maybe that'not as clear as I'd like it to be. However, when I postthese changes to DERBY-805 (as Phase 1 "v2"), I will also update the DERBY-805document to include this description and a walk-through example showing how it'sall supposed to work. In the meantime, if anyone understands this descriptionwell enough to point out any glaring problems, that'd be great. Otherwise, I'llwork on cleaning up the patch and adding an example to DERBY-805.html, and willpost both later today...


Many thanks to all who continue to follow these optimizer threads,
Army

Re: [OPTIMIZER] OptimizerImpl "best plans" for subqueries?

Reply via email to