Jeffrey Lichtman wrote:
I think it may be necessary when remembering "truly the best" access path at any level of subquery to tell each table subquery to remember its best access path as "truly the best" also, and to recurse through all the nested subqueries to do this. I don't think this would be very hard to implement.

I was actually toying around with something like this yesterday, so it's nice to hear you suggest a similar approach :) I wrote up a solution that tries to implement it; I ran derbyall last night with the changes and there were no failures. I also ran my predicate pushing code locally with these changes applied (in place of the Phase 1 patch) and everything worked there, too. So I think this is the right way to go--and it feels more robust to me than the first Phase 1 patch for DERBY-805. So thanks for the ideas and reassurance...

I only worry about what would happen with deeply-nested subqueries - this could create an order 2**n algorithm (that is, one whose time would grow exponentially with the number of levels of nesting).

I guess my answer to that is if we're dealing with deeply nested queries, the caller will probably be preparing the query once and then executing it multiple times, in which case the extra optimization cost would be an up-front, one time thing while the benefits from doing the extra work could lead to huge performance savings for every execution of the query thereafter (when combined with predicate pushdown, that is). Depending on just how much overhead we're adding to the optimization phase, this seems like an acceptable trade-off to me...

The solution I've coded tries to do what you describe. I'll outline it here and post the patch as a new "Phase 1" patch for DERBY-805 since as I said earlier, I think this issue is really the issue that Phase 1 is trying to address.

First, I added a new field called "optimizerToBestPlanMap" in FromTable, which is the parent class to all Optimizables. This field (which is a HashMap) holds "truly the best access path" (TTBP) for the Optimizable with respect to every OptimizerImpl at or above it, where "at" refers to the OptimizerImpl to which the Optimizable directly belongs.

I then added a new method called "addOrLoadBestPlanMapping" to the Optimizable interface, implemented in FromTable. The signature for this method is:

+       public void addOrLoadBestPlanMapping(boolean doAdd,
+               Optimizer optimizer) throws StandardException
+       {

If "doAdd" is true then we will take the Optimizable's trulyTheBestAccessPath field and put a copy of it into optimizerToBestPlanMap, with the key being the received optimizer. If "doAdd" is false, then we will search for the hash map for a TTBP that corresponds to the received optimizer, and if we find one we will load that path information into the Optimizable's trulyTheBestAccessPath field.

This new method is over-written by the two Optimizable classes that can have children: namely, SingleChildResultSetNode and TableOperatorNode. In each case the method will first call super.addOrLoadBestPlanMapping(), which adds/loads TTBP for the node itself, and then it will recursively call that method on the child result in two cases: 1) if the child is another Optimizable, then the call is made directly on that child; 2) if the child is a SelectNode, then we call a new method on SelectNode that will iterate through all Optimizables in its FROM list and recursively call addOrLoadBestPlanMapping(). By doing so we "recurse through all nested subqueries", as Jeff said. For example, in SingleChildResultSetNode we have the following:

+       /** @see Optimizable#addOrLoadBestPlanMapping */
+       public void addOrLoadBestPlanMapping(boolean doAdd,
+               Optimizer optimizer) throws StandardException
+       {
+               super.addOrLoadBestPlanMapping(doAdd, optimizer);
+               if (childResult instanceof Optimizable)
+               {
+                       ((Optimizable)childResult).
+                               addOrLoadBestPlanMapping(doAdd, optimizer);
+               }
+               else if (childResult instanceof SelectNode)
+               {
+                       ((SelectNode)childResult).
+                               addOrLoadBestPlanMappings(doAdd, optimizer);
+               }
+       }

The third thing I've done is change the "rememberAsBest()" method on the Optimizable interface to take an instance of Optimizer as an argument. Then whenever an OptimizerImpl decides it has found a best overall path and calls rememberAsBest(), we make an additional call (within rememberAsBest()) to the new addOrLoadBestPlanMapping() method with "doAdd" set to true and passing in the optimizer. This means that whenever an OptimizerImpl finds a best plan, every Optimizable in its list (including all Optimizables found recursively through subqueries) will remember what its TTBP was with respect to that OptimizerImpl.

And as a last step, I've added two calls to addOrLoadBestPlanMapping() to the OptimizerImpl class itself. The first is made with "doAdd" set to true and is called whenever we place an Optimizable at a join position, just before we call "startOptimizing()". We need this call in order to remember what TTBP for each Optimizable (including those within subqueries) is _before_ we begin optimizing, so that if the join order for the current OptimizerImpl ends up being worse than a previous join order, we can "revert" the Optimizable (and recursively any Optimizables from nested subqueries) back to its TTBP for the OptimizerImpl's previously-determined best join order.

The second call to addOrLoadBestPlanMapping() comes when we pull an Optimizable from its current join position, in which case doAdd is "false", which we means we take the Optimizable's TTBP corresponding to the current OptimizerImpl and load it into the Optimizable's trulyTheBestAccessPath field. If the OptimizerImpl's most recent join order was the best one so far, TTBP will end up holding the plan that was saved for that join order; otherwise, TTBP will end up holding the "reverted" plan, which is the plan the Optimizable had for whatever join order was previously considered best.

Okay, so maybe that'not as clear as I'd like it to be. However, when I post these changes to DERBY-805 (as Phase 1 "v2"), I will also update the DERBY-805 document to include this description and a walk-through example showing how it's all supposed to work. In the meantime, if anyone understands this description well enough to point out any glaring problems, that'd be great. Otherwise, I'll work on cleaning up the patch and adding an example to DERBY-805.html, and will post both later today...

Many thanks to all who continue to follow these optimizer threads,
Army

Reply via email to