Army wrote:
----------------------------------
3) Comparing plan costs.
----------------------------------
In OptimizerImpl.java there is the following code for comparing
"current" cost with "best cost" for this round of optimization:
/*
** Is the cost of this join order lower than the best one we've
** found so far?
**
** NOTE: If the user has specified a join order, it will be the
** only join order the optimizer considers, so it is OK to use
** costing to decide that it is the "best" join order.
*/
if ((! foundABestPlan) || currentCost.compare(bestCost) < 0)
{
rememberBestCost(currentCost, Optimizer.NORMAL_PLAN);
// Since we just remembered all of the best plans,
// no need to reload them when pulling Optimizables
// from this join order.
reloadBestPlan = false;
}
The call to "currentCost.compare()" only compares the estimatedCost
fields; it does not look at the other two fields of a
CostEstimate--namely, rowCount and singleScanRowCount.
In running some tests against 10.1, though, I noticed that it is
possible for currentCost and bestCost to have the same estimatedCost
values but different row counts. Since the above check does not look at
row counts, we can end up in situations where the following two queries
can result in different query plans:
select * from V1, V2 where V1.j = V2.b;
select * from V2, V1 where V1.j = V2.b;
I.e. the queries are identical with the exception of the order of V1 and
V2. In my particular example V1 and V2 are both unions of two tables,
where V2 has 100,000 rows and V1 has less than ten. When I ran these
queries against 10.1 with NO optimizer timeout--meaning that the
optimizer had time to examine all possible query plans--the result was
surprising: the first query chose join order [ V2, V1 ] and took 4.5 to
5 seconds to complete; the second query chose join order [ V1, V2 ] and
took 30-33 seconds to complete. Wow.
I made a small change to the CostEstimateImpl.compare() method to
consider rowCount as a secondary field of comparison (if the
estimatedCosts are the same) and singleScanRowCount as a third field of
comparison (if the rowCounts are the same). With that change the
behavior was then what I expected: namely, the optimizer chose the same
join order (the faster one) regardless of which query I ran.
So my question is this: if two cost estimates have the same
estimatedCost but different rowCounts (or even singleScanRowCounts), it
is reasonable to suppose that the estimate with the lower rowCount is
actually better? In the above case the query with the lower row count
was the one that ran more quickly--is it reasonable to believe that this
will generally be the case?
I would think it would be better to figure out why the costs are the
same for these vastly different plans. Seems like a problem with
costing. It seems cleaner to me compare based on one field. On the
face of this there is some bad assumption in building the cost - maybe
it is the 0 cost per hash look up you also mentioned? What were the
actual 2 plans?
---
As always, thanks to anyone with the time and inclination to read this
email and provide feedback...
Army