Thanks for making this page, Bryan! I found this mail thread useful, so it is good make a Wiki page for it!
Dag Apache Wiki <[EMAIL PROTECTED]> writes: > Dear Wiki user, > > You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for > change notification. > > The following page has been changed by BryanPendleton: > http://wiki.apache.org/db-derby/QueryPlanJoinOrder > > The comment on the change is: > Edit the discussion from the derby-dev list into a wiki page > > New page: > = Understanding Join Order displays in the Statement Execution Plan = > > One of the results of running the query optimizer is the determination of the > join order. > For a particular join there is a "left" table and a "right" table, and > overall there is > an ordering of the various intermediate joins into an overall join order. > > Looking at the output from the statement execution plan one might ask : > * what is the overall join order? > * and, for a particular join, which table is the left table and which is the > right table? > > As a general rule, when looking at a given query plan the join order is > reflected by the order in which you see the table names as you scroll from > the top of the plan downward. > > In the plan "T1" appears first (i.e. closer to the beginning of the plan), > then "T3" appears afterward, so that's a general indication that the join > order is "T1", "T3". > > Note that the query plan shows the underlying base table names, > not correlation names or view names or other ways of aliasing base tables, so > with respect to the top-level query in question, i.e. to: > {{{ > select x1.j, x2.b from > (select distinct i,j from t1) x1, > (select distinct a,b from t3) x2 > where x1.i = x2.a > order by x1.j, x2.b > }}} > the join order is _technically_ { X1, X2 }. But the query plan only shows > base table access, so we have to look to see what tables X1 and X2 access. > In this case X1 accesses T1 while X2 accesses T3, so when we scan the plan > and see { T1, T3 }, that effectively implies a join order of { X1, X2 }. > > On a more general level, the "scan downward" approach to finding the join > order works because the query plan is written in terms of "left" and "right" > result sets, as Knut Anders mentioned. If I'm joining three tables T1, T2, > T3 and the join order chosen by the optimizer is {T2, T3, T1} the final query > tree will look something like: > {{{ > JOIN_0 > / \ > JOIN_1 T1 > / \ > T2 T3 > }}} > Notice how each join node has a "left" and a "right" child. The query plan > is generated in depth-first traversal order (starting with the root), so the > query plan for the above tree would look something like: > {{{ > JoinResultSet_0: > +++ LeftResultSet: > ++++++ JoinResultSet_1: > +++++++++ LeftResultSet: T2 > +++++++++ RightResultSet: T3 > +++ RightResultSet: T1 > }}} >>From this we can see that the order in which the tables appear in the query >>plan (reading top to bottom) will match the order that comes from reading the >>leaf nodes of the join tree left-to-right, and that in turn reflects the join >>order chosen by the optimizer. > > === Extracting join order information automatically from the > RUNTIMESTATISTICS output === > > Kathey proposed enhancing the RuntimeStatisticsParser to automatically > determine join order by searching > for the specified strings in order and return true if they are there in the > order they are in the array. > > Army observed that when using this to write automated tests, the test author > must be careful to ensure > that the argument array passed to the new function includes *ALL* tables in > the query, not just a subset. > If the query is of the form: > {{{ > select ... from > (select ... from t2, t1, t3 where ...) X1 > (select ... from t1, t2 where ...) X2 > where ... > }}} > Assume a test wants to verify that the tables in subquery X2 have a join > order of { T2, T1 }, but doesn't really care about the join order of the > subquery in X1, nor does it care about the order of X1 w.r.t. X2. You'd > *still* have to make sure that the array passed into the ordered search > method includes the join order for X1, as well, otherwise the test might > incorrectly pass. > > For example, if we only check for the join order of the "targeted" subquery > X2, meaning we pass ["T2", "T1"] into the proposed method and ignore X1 > altogether, then the test would IN-correctly PASS for the following query > plan: > {{{ > ProjectRestrict: > +++ JoinNode_0: > ++++++ LeftResultSet: <== This corresponds to X1 > +++++++++ JoinNode_1: > ++++++++++++ LeftResultSet: > +++++++++++++++ JoinNode_2: > ++++++++++++++++++ LeftResultSet: T3 > ++++++++++++++++++ RightResultSet: T2 > ++++++++++++ RightResultSet: T1 > ++++++ RightResultSet: <== This corresponds to X2 > +++++++++ JoinNode_3: > ++++++++++++ LeftResultSet: T1 > ++++++++++++ RightResultSet: T2 > }}} > > If you just search for "T1" followed by "T2", the test will pass because the > join order for X1 matches--but that's wrong because it's really X2 that we > wanted to check. > > If instead of {"T1", "T2"} you pass in {"T3", "T2", "T1", "T2", "T1"}--i.e. > include *ALL* tables in the query, even the ones that aren't necessarily > targeted--then I think you'd get the desired behavior. The downside to this > is that the test will fail if a join order about which we "don't care" > changes (ex. the join order for X1 in this case). But that's how things work > today with the canon-based test, as well, so even if it's not ideal, at least > it wouldn't really be any worse... > > To get the ideal behavior (where the test fails if and only if the "targeted" > subquery's join order is not what is expected) with the proposed > orderedSearchStrings() approach, one would have to ensure that the table > names used in the targeted subquery do not appear anywhere else in the query. > My guess is that you would have to rewrite a good number of tests to > guarantee that, which would probably be non-trivial. > > This page was compiled with information posted to the derby-dev mailing list > by Manjula, Kathey, Army and Knut. > -- Dag H. Wanvik, staff engineer Sun Microsystems, Databases (JavaDB/Derby) Haakon VII gt. 7b, N-7485 Trondheim, Norway Tel: x43496/+47 73842196, Fax: +47 73842101
