[ http://issues.apache.org/jira/browse/DERBY-1905?page=all ]

A B updated DERBY-1905:
-----------------------

    Attachment: d1905_sandbox_doNOTCommit.patch

Attaching an incomplete and potentially incorrect patch that should NOT be 
committed.  This is just the result of some playing around that I was doing 
with cost estimates in the Optimizer in an attempt to resolve this issue.  It's 
possible that the premise/theory behind the changes is incorrect and thus that 
this approach is not a viable solution.  However, I am seeing considerable 
improvement in the compilation AND execution times for many large queries after 
applying this patch.

As one example, I'm seeing the following times before and after applying this 
patch when running the DERBY-1777 repro script:

[ Compile times only (repro does not execute the queries ]

                              Before          After

 ViewerRoot                                          3.709s   |   3.782s
 ViewerSanL0                               156.900s   |   9.387s
 ViewerOtherL0                               0.204s   |   0.374s
 ViewerServerL0                              0.093s   |   0.062s
 ViewerStorageL0                             0.186s   |   0.145s
 ViewerSanL1                               574.928s   |  18.915s
 ViewerOtherL1                             163.653s   |   2.793s
 ViewerServerL1                            368.006s   |   5.212s
 ViewerSubsystemL1                         373.940s   |   2.862s
 ViewerTapeLibraryL1                10.633s   |   1.693s
 ViewerOtherL2                            1368.968s   |   64.909s
 ViewerServerL2                           2223.887s   |   64.595s
 ViewerPropagatedStatus             82.467s   |   1.906s
 ViewerSubsystemL2                        1790.749s   |   70.922s
 ViewerTapeLibraryL2              1738.614s   |   67.455s
 ViewerSwitchL2                           2098.310s   |   38.211s

The difference in some cases is as much x35.

I also ran the queries attached to DERBY-1205 before and after applying the 
patch.  Without the patch two queries in particular take 40 and 3 seconds to 
compile respectively, and executed in 3 to 5 seconds (these numbers are rough; 
I don't have the exact figures).  With the patch they compile in 10 and 3 
seconds respectively and execute in under a second.

And finally, I ran the script attached to DERBY-2130 before and after the 
patch.  Before the patch the query runs in about 550 to 700 seconds; after the 
patch, it compiles in 5 seconds and executes in less than a second.

BUT the patch is not by any means complete.  When I ran derbyall I saw at least 
20 failures, many of which are execution-time NPEs that, oddly enough, result 
from the changes to the currentCost (which I never would have expected).  Also:

  1. I don't really know anything about "sort avoidance costs", so I have no 
idea if my changes make sense in that regard (I sort of doubt it).

  2. As described more in the discussions for DERBY-2130, I removed an 
"if-block" from OptimizerImpl that actually causes an infinite loop in 
lang/innerjoin.sql (I hacked around that before running derbyall).  So that has 
to be resolved.

  3. I don't know if the "theory" behind my changes makes sense.

Generally speaking there are two big "theories" behind these changes.

 A. In the optimizeIt() methods of UnionNodes, FromTables, and JoinNodes we are 
multiplying the cost estimates by "outerCost"s row count more times than we 
should.  This part of the patch makes sense to me and I think is probably 
mostly correct.

 B. We shouldn't be adding up the costs of the optimizables for a join order.  
At least, not the way we currently do.  If I have a join order [ 1 3 2] then 
the join tree looks like this:

     JOIN[0]
     /     \
 JOIN[1]   Opt[2]
  /     \
Opt[1]  Opt[3]

The cost of Opt[2] is actually the cost of JOIN[0], which is the cost of 
joining JOIN[1] with Opt[2].  So why are we adding the cost of JOIN[1] to the 
cost of JOIN[0]?  Isn't the cost of JOIN[1] already factored into the cost of 
JOIN[0]?  So the changes in this patch basically remove the addition of JOIN[1] 
to JOIN[0] and instead say that the final cost of the join order is simply the 
cost of JOIN[0].

This could very well be IN-correct thinking.  For one thing, I think it ignores 
the cost of actually scanning the rows from Opt[1] (and potentially the other 
opts as well?), which may be a problem.  That's why this patch should not be 
committed, and why further discussion/investigation is required.  But for what 
it's worth, it really speeds things up :)

YMMV.  If anyone has the time/inclination to pick this up and continue working 
with it (I am not actively pursuing this at the moment), please feel free!

> Optimizer cost estimates for subqueries are way (way) too high.
> ---------------------------------------------------------------
>
>                 Key: DERBY-1905
>                 URL: http://issues.apache.org/jira/browse/DERBY-1905
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.2
>            Reporter: A B
>         Attachments: d1905.sql, d1905_sandbox_doNOTCommit.patch
>
>
> If I run the attached repro (pulled from DERBY-1866) with 
> derby.optimizer.noTimeout=true (meaning the optimizer should take all the 
> time it needs to try out all possible plans within the restrictions of the 
> optimizer overrides), the estimated cost shown in derby.log (if I log the 
> query plan) is over 600k and the estimated row count is over 520k.
> If, as the code in OptimizerImpl seems to expect, the unit for a cost 
> estimate is milliseconds, then the optimize here is guessing that the query 
> will take over 10 minutes to execute and will return a half-million rows.  
> But in truth the *combined* time for compilation AND execution is just a 
> couple of seconds, and only 15 rows are returned.
> That suggests to me a rather serious problem with the optimizer cost 
> estimates for subqueries.
> Among other things this can have a major impact on the optimizer's timeout 
> mechanism for very deeply-nested queries.  The optimizer will continue to try 
> out different combinations of indexes/joinOrders/joinStrategies until it 
> either exhausts them all or until the number of milliseconds spent optimizing 
> is greater than the "best cost" estimate so far. In the case of the repro for 
> this issue, the optimizer quickly exhausts all of the options and so it 
> finishes in a fair amount of time.
> But in larger queries where there are far more combinations to try (see esp. 
> queries attached to DERBY-1205, DERBY-1777), these severly inflated cost 
> estimates get very large very quickly (sometimes even reaching infinity--see 
> DERBY-1259, DERBY-1260) and so the optimizer just keeps on optimizing and 
> never times out.  The result is that for queries like those in DERBY-1777, 
> the optimizer can spend literally hours optimizing a query which then 
> executes in a matter of seconds.
> I'm still investigating this, but preliminary examination suggests that part 
> of the problem is in some way related to the treatment of "outer costs" by 
> the optimizer--and in particular, it looks like the optimizer is perhaps too 
> aggressive in multiplying cost estimates by row counts pulled from "outer 
> costs".  That's just my first guess at the problem, though; there could of 
> course be far more to it than that...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to