Re: [ optimizer ] Some questions on optimization cost estimates?

Mike Matrigali Mon, 03 Apr 2006 11:10:09 -0700


Army wrote:

----------------------------------
3) Comparing plan costs.
----------------------------------
In OptimizerImpl.java there is the following code for comparing"current" cost with "best cost" for this round of optimization:
    /*
    ** Is the cost of this join order lower than the best one we've
    ** found so far?
    **
    ** NOTE: If the user has specified a join order, it will be the
    ** only join order the optimizer considers, so it is OK to use
    ** costing to decide that it is the "best" join order.
    */
    if ((! foundABestPlan) || currentCost.compare(bestCost) < 0)
    {
        rememberBestCost(currentCost, Optimizer.NORMAL_PLAN);

        // Since we just remembered all of the best plans,
        // no need to reload them when pulling Optimizables
        // from this join order.
        reloadBestPlan = false;
    }
The call to "currentCost.compare()" only compares the estimatedCostfields; it does not look at the other two fields of aCostEstimate--namely, rowCount and singleScanRowCount.
In running some tests against 10.1, though, I noticed that it ispossible for currentCost and bestCost to have the same estimatedCostvalues but different row counts. Since the above check does not look atrow counts, we can end up in situations where the following two queriescan result in different query plans:
select * from V1, V2 where V1.j = V2.b;
select * from V2, V1 where V1.j = V2.b;
I.e. the queries are identical with the exception of the order of V1 andV2. In my particular example V1 and V2 are both unions of two tables,where V2 has 100,000 rows and V1 has less than ten. When I ran thesequeries against 10.1 with NO optimizer timeout--meaning that theoptimizer had time to examine all possible query plans--the result wassurprising: the first query chose join order [ V2, V1 ] and took 4.5 to5 seconds to complete; the second query chose join order [ V1, V2 ] andtook 30-33 seconds to complete. Wow.
I made a small change to the CostEstimateImpl.compare() method toconsider rowCount as a secondary field of comparison (if theestimatedCosts are the same) and singleScanRowCount as a third field ofcomparison (if the rowCounts are the same). With that change thebehavior was then what I expected: namely, the optimizer chose the samejoin order (the faster one) regardless of which query I ran.
So my question is this: if two cost estimates have the sameestimatedCost but different rowCounts (or even singleScanRowCounts), itis reasonable to suppose that the estimate with the lower rowCount isactually better? In the above case the query with the lower row countwas the one that ran more quickly--is it reasonable to believe that thiswill generally be the case?

I would think it would be better to figure out why the costs are thesame for these vastly different plans. Seems like a problem withcosting. It seems cleaner to me compare based on one field. On the

face of this there is some bad assumption in building the cost - maybe

it is the 0 cost per hash look up you also mentioned? What were theactual 2 plans?

---
As always, thanks to anyone with the time and inclination to read thisemail and provide feedback...
Army

Re: [ optimizer ] Some questions on optimization cost estimates?

Reply via email to