[ http://issues.apache.org/jira/browse/DERBY-1357?page=all ]
A B updated DERBY-1357:
-----------------------
Attachment: d1357_v1.patch
d1357_v1.stat
Attaching a patch, d1357_v1.patch, to address the incorrect logic as noted in
the description.
I also had to update the lang/predicatePushdown.out master file because with
the d1357_v1.patch the order of a couple of qualifiers has changed. Note that
the query plans themselves are exactly the same--the only thing that's changed
is the the qualifier ordering for one query. This change of order occurs
because as part of the costing code in FromBaseTable.estimateCost() the
optimizer transfers predicates from one predicate list to another, gets an
estimated cost, then puts the predicates back into the original list. The
methods to do this transferring are in NestedLoopJoinStrategy.java and
HashJoinStrategy.java. In the former, the predicates are transferred away in
front-to-back order and then transferred back in back-to-front order, which
leads to a reversal of the relevant predicate ordering. Ex. If we have a list
L1 of preds A,B,C and we transfer them to L2 in front-to-back order, L2 will
end up with A,B,C. Then, when we transfer the predicates back to L1, if we
process L2 in back-to-front order, L1 will end up with C,B,A. That said, with
d1357_v1 applied the short-circuit logic prevents the optimizer from trying to
optimize a join order that is known to be bad. This means that we skip an
unnecessary round of optimization and therefore skip one round of order
reversal, which means the order of the predicate qualifiers in the final plan
is now different.
I ran derbyall on Red Hat with sane jars and ibm142, and saw no new failures.
Review/commit would be appreciated.
> Short-circuit logic in optimizer appears to be incorrect...
> -----------------------------------------------------------
>
> Key: DERBY-1357
> URL: http://issues.apache.org/jira/browse/DERBY-1357
> Project: Derby
> Type: Bug
> Components: Performance
> Versions: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.2.0.0, 10.1.2.0, 10.1.1.1,
> 10.1.1.2, 10.1.2.1, 10.1.2.2, 10.1.2.3, 10.1.2.4
> Reporter: A B
> Assignee: A B
> Priority: Minor
> Fix For: 10.2.0.0
> Attachments: d1357_v1.patch, d1357_v1.stat
>
> When considering different join orders for the FROM tables in a query, the
> optimizer will decide to give up on a join order midway through if the cost
> of that (partial) join order is already higher than the cost of some other
> *complete* join order that the optimizer previously found. This
> "short-circuiting" of a join order can save compilation time.
> That said, the logic to perform this "short-circuit" of a join order is
> currently as follows (from OptimizerImpl.java):
> /*
> ** Pick the next table in the join order, if there is an unused position
> ** in the join order, and the current plan is less expensive than
> ** the best plan so far, and the amount of time spent optimizing is
> ** still less than the cost of the best plan so far, and a best
> ** cost has been found in the current join position. Otherwise,
> ** just pick the next table in the current position.
> */
> boolean joinPosAdvanced = false;
> if ((joinPosition < (numOptimizables - 1)) &&
> ((currentCost.compare(bestCost) < 0) ||
> (currentSortAvoidanceCost.compare(bestCost) < 0)) &&
> ( ! timeExceeded )
> )
> {
> ...
> }
> There are two "current costs" in this statement: one for the cost if the
> optimizer is calculating a "sort avoidance" plan (which it does if there is a
> required row ordering on the results) and one if it is calculating a plan for
> which row order is not important.
> I admit that I'm not all that familiar with what goes on with the costing of
> a sort-avoidance plan, but inspection of the code shows that, when there is
> no required row ordering--i.e. when we aren't looking for a sort-avoidance
> plan--the cost field of currentSortAvoidanceCost will always be 0.0d. That in
> turn means that in the above "if" statement, the check for
> ((currentCost.compare(bestCost) < 0) ||
> (currentSortAvoidanceCost.compare(bestCost) < 0))
> will always return true (because bestCost should--in theory--always be
> greater than 0.0d). Thus, in the case where we don't have a required row
> ordering, the short-circuit logic will fail even if currentCost is actually
> greater than bestCost.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira