Jeffrey Lichtman wrote:
I think it may be necessary when remembering "truly the best" access
path at any level of subquery to tell each table subquery to remember
its best access path as "truly the best" also, and to recurse through
all the nested subqueries to do this. I don't think this would be very
hard to implement.
I was actually toying around with something like this yesterday, so it's nice to
hear you suggest a similar approach :) I wrote up a solution that tries to
implement it; I ran derbyall last night with the changes and there were no
failures. I also ran my predicate pushing code locally with these changes
applied (in place of the Phase 1 patch) and everything worked there, too. So I
think this is the right way to go--and it feels more robust to me than the first
Phase 1 patch for DERBY-805. So thanks for the ideas and reassurance...
I only worry about what would happen with deeply-nested subqueries - this
could create an order 2**n algorithm (that is, one whose time would grow
exponentially with the number of levels of nesting).
I guess my answer to that is if we're dealing with deeply nested queries, the
caller will probably be preparing the query once and then executing it multiple
times, in which case the extra optimization cost would be an up-front, one time
thing while the benefits from doing the extra work could lead to huge
performance savings for every execution of the query thereafter (when combined
with predicate pushdown, that is). Depending on just how much overhead we're
adding to the optimization phase, this seems like an acceptable trade-off to me...
The solution I've coded tries to do what you describe. I'll outline it here and
post the patch as a new "Phase 1" patch for DERBY-805 since as I said earlier, I
think this issue is really the issue that Phase 1 is trying to address.
First, I added a new field called "optimizerToBestPlanMap" in FromTable, which
is the parent class to all Optimizables. This field (which is a HashMap) holds
"truly the best access path" (TTBP) for the Optimizable with respect to every
OptimizerImpl at or above it, where "at" refers to the OptimizerImpl to which
the Optimizable directly belongs.
I then added a new method called "addOrLoadBestPlanMapping" to the Optimizable
interface, implemented in FromTable. The signature for this method is:
+ public void addOrLoadBestPlanMapping(boolean doAdd,
+ Optimizer optimizer) throws StandardException
+ {
If "doAdd" is true then we will take the Optimizable's trulyTheBestAccessPath
field and put a copy of it into optimizerToBestPlanMap, with the key being the
received optimizer. If "doAdd" is false, then we will search for the hash map
for a TTBP that corresponds to the received optimizer, and if we find one we
will load that path information into the Optimizable's trulyTheBestAccessPath field.
This new method is over-written by the two Optimizable classes that can have
children: namely, SingleChildResultSetNode and TableOperatorNode. In each case
the method will first call super.addOrLoadBestPlanMapping(), which adds/loads
TTBP for the node itself, and then it will recursively call that method on the
child result in two cases: 1) if the child is another Optimizable, then the call
is made directly on that child; 2) if the child is a SelectNode, then we call a
new method on SelectNode that will iterate through all Optimizables in its FROM
list and recursively call addOrLoadBestPlanMapping(). By doing so we "recurse
through all nested subqueries", as Jeff said. For example, in
SingleChildResultSetNode we have the following:
+ /** @see Optimizable#addOrLoadBestPlanMapping */
+ public void addOrLoadBestPlanMapping(boolean doAdd,
+ Optimizer optimizer) throws StandardException
+ {
+ super.addOrLoadBestPlanMapping(doAdd, optimizer);
+ if (childResult instanceof Optimizable)
+ {
+ ((Optimizable)childResult).
+ addOrLoadBestPlanMapping(doAdd, optimizer);
+ }
+ else if (childResult instanceof SelectNode)
+ {
+ ((SelectNode)childResult).
+ addOrLoadBestPlanMappings(doAdd, optimizer);
+ }
+ }
The third thing I've done is change the "rememberAsBest()" method on the
Optimizable interface to take an instance of Optimizer as an argument. Then
whenever an OptimizerImpl decides it has found a best overall path and calls
rememberAsBest(), we make an additional call (within rememberAsBest()) to the
new addOrLoadBestPlanMapping() method with "doAdd" set to true and passing in
the optimizer. This means that whenever an OptimizerImpl finds a best plan,
every Optimizable in its list (including all Optimizables found recursively
through subqueries) will remember what its TTBP was with respect to that
OptimizerImpl.
And as a last step, I've added two calls to addOrLoadBestPlanMapping() to the
OptimizerImpl class itself. The first is made with "doAdd" set to true and is
called whenever we place an Optimizable at a join position, just before we call
"startOptimizing()". We need this call in order to remember what TTBP for each
Optimizable (including those within subqueries) is _before_ we begin optimizing,
so that if the join order for the current OptimizerImpl ends up being worse than
a previous join order, we can "revert" the Optimizable (and recursively any
Optimizables from nested subqueries) back to its TTBP for the OptimizerImpl's
previously-determined best join order.
The second call to addOrLoadBestPlanMapping() comes when we pull an Optimizable
from its current join position, in which case doAdd is "false", which we means
we take the Optimizable's TTBP corresponding to the current OptimizerImpl and
load it into the Optimizable's trulyTheBestAccessPath field. If the
OptimizerImpl's most recent join order was the best one so far, TTBP will end up
holding the plan that was saved for that join order; otherwise, TTBP will end up
holding the "reverted" plan, which is the plan the Optimizable had for whatever
join order was previously considered best.
Okay, so maybe that'not as clear as I'd like it to be. However, when I post
these changes to DERBY-805 (as Phase 1 "v2"), I will also update the DERBY-805
document to include this description and a walk-through example showing how it's
all supposed to work. In the meantime, if anyone understands this description
well enough to point out any glaring problems, that'd be great. Otherwise, I'll
work on cleaning up the patch and adding an example to DERBY-805.html, and will
post both later today...
Many thanks to all who continue to follow these optimizer threads,
Army