Re: [Db-derby Wiki] Update of "QueryPlanJoinOrder" by BryanPendleton

Dag H. Wanvik Tue, 07 Oct 2008 07:40:21 -0700

Thanks for making this page, Bryan! I found this mail thread useful,
so it is good make a Wiki page for it!


Dag

Apache Wiki <[EMAIL PROTECTED]> writes:

> Dear Wiki user,
>
> You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for 
> change notification.
>
> The following page has been changed by BryanPendleton:
> http://wiki.apache.org/db-derby/QueryPlanJoinOrder
>
> The comment on the change is:
> Edit the discussion from the derby-dev list into a wiki page
>
> New page:
> = Understanding Join Order displays in the Statement Execution Plan =
>
> One of the results of running the query optimizer is the determination of the 
> join order.
> For a particular join there is a "left" table and a "right" table, and 
> overall there is
> an ordering of the various intermediate joins into an overall join order.
>
> Looking at the output from the statement execution plan one might ask :
>  * what is the overall join order?
>  * and, for a particular join, which table is the left table and which is the 
> right table?
>
> As a general rule, when looking at a given query plan the join order is 
> reflected by the order in which you see the table names as you scroll from 
> the top of the plan downward.
>
> In the plan "T1" appears first (i.e. closer to the beginning of the plan), 
> then "T3" appears afterward, so that's a general indication that the join 
> order is "T1", "T3".
>
> Note that the query plan shows the underlying base table names, 
> not correlation names or view names or other ways of aliasing base tables, so
> with respect to the top-level query in question, i.e. to:
> {{{
>   select x1.j, x2.b from
>     (select distinct i,j from t1) x1,
>     (select  distinct a,b from t3) x2
>   where x1.i = x2.a
>   order by x1.j, x2.b
> }}}
> the join order is _technically_ { X1, X2 }.  But the query plan only shows 
> base table access, so we have to look to see what tables X1 and X2 access.  
> In this case X1 accesses T1 while X2 accesses T3, so when we scan the plan 
> and see { T1, T3 }, that effectively implies a join order of { X1, X2 }. 
>
> On a more general level, the "scan downward" approach to finding the join 
> order works because the query plan is written in terms of "left" and "right" 
> result sets, as Knut Anders mentioned.  If I'm joining three tables T1, T2, 
> T3 and the join order chosen by the optimizer is {T2, T3, T1} the final query 
> tree will look something like:
> {{{
>        JOIN_0
>         /  \
>     JOIN_1  T1
>       /  \
>      T2   T3
> }}}
> Notice how each join node has a "left" and a "right" child.  The query plan 
> is generated in depth-first traversal order (starting with the root), so the 
> query plan for the above tree would look something like:
> {{{
> JoinResultSet_0:
> +++ LeftResultSet:
> ++++++ JoinResultSet_1:
> +++++++++ LeftResultSet:  T2
> +++++++++ RightResultSet: T3
> +++ RightResultSet:       T1
> }}}
>>From this we can see that the order in which the tables appear in the query 
>>plan (reading top to bottom) will match the order that comes from reading the 
>>leaf nodes of the join tree left-to-right, and that in turn reflects the join 
>>order chosen by the optimizer.
>
> === Extracting join order information automatically from the 
> RUNTIMESTATISTICS output ===
>
> Kathey proposed enhancing the RuntimeStatisticsParser to automatically 
> determine join order by searching
> for the specified strings in order and return true if they are there in the 
> order they are in the array.
>
> Army observed that when using this to write automated tests, the test author 
> must be careful to ensure
> that the argument array passed to the new function includes *ALL* tables in 
> the query, not just a subset.
> If the query is of the form:
> {{{
>   select ... from
>     (select ... from t2, t1, t3 where ...) X1
>     (select ... from t1, t2 where ...) X2
>   where ...
> }}}
> Assume a test wants to verify that the tables in subquery X2 have a join 
> order of { T2, T1 }, but doesn't really care about the join order of the 
> subquery in X1, nor does it care about the order of X1 w.r.t. X2.  You'd 
> *still* have to make sure that the array passed into the ordered search 
> method includes the join order for X1, as well, otherwise the test might 
> incorrectly pass.
>
> For example, if we only check for the join order of the "targeted" subquery 
> X2, meaning we pass ["T2", "T1"] into the proposed method and ignore X1 
> altogether, then the test would IN-correctly PASS for the following query 
> plan:
> {{{
>   ProjectRestrict:
>   +++ JoinNode_0:
>   ++++++ LeftResultSet:                  <== This corresponds to X1
>   +++++++++ JoinNode_1:
>   ++++++++++++ LeftResultSet:
>   +++++++++++++++ JoinNode_2:
>   ++++++++++++++++++ LeftResultSet: T3
>   ++++++++++++++++++ RightResultSet: T2
>   ++++++++++++ RightResultSet: T1
>   ++++++ RightResultSet:                 <== This corresponds to X2
>   +++++++++ JoinNode_3:
>   ++++++++++++ LeftResultSet: T1
>   ++++++++++++ RightResultSet: T2
> }}}
>
> If you just search for "T1" followed by "T2", the test will pass because the 
> join order for X1 matches--but that's wrong because it's really X2 that we 
> wanted to check.
>
> If instead of {"T1", "T2"} you pass in {"T3", "T2", "T1", "T2", "T1"}--i.e. 
> include *ALL* tables in the query, even the ones that aren't necessarily 
> targeted--then I think you'd get the desired behavior. The downside to this 
> is that the test will fail if a join order about which we "don't care" 
> changes (ex. the join order for X1 in this case).  But that's how things work 
> today with the canon-based test, as well, so even if it's not ideal, at least 
> it wouldn't really be any worse...
>
> To get the ideal behavior (where the test fails if and only if the "targeted" 
> subquery's join order is not what is expected) with the proposed 
> orderedSearchStrings() approach, one would have to ensure that the table 
> names used in the targeted subquery do not appear anywhere else in the query. 
>  My guess is that you would have to rewrite a good number of tests to 
> guarantee that, which would probably be non-trivial.
>
> This page was compiled with information posted to the derby-dev mailing list 
> by Manjula, Kathey, Army and Knut.
>

-- 
Dag H. Wanvik, staff engineer
Sun Microsystems, Databases (JavaDB/Derby)
Haakon VII gt. 7b, N-7485 Trondheim, Norway
Tel: x43496/+47 73842196, Fax:  +47 73842101

Re: [Db-derby Wiki] Update of "QueryPlanJoinOrder" by BryanPendleton

Reply via email to