Re: Question regarding runtimestatistics and join order

Army Thu, 02 Oct 2008 10:48:34 -0700

Kathey Marsden wrote:

As for how to incorporate this into RuntimeStatisticsParser, the onlything I can think of is to add an boolean orderedSearchStrings(String[]searchStrings) method to RuntimeStatistics parser that will search forthe specified strings in order and return true if they are there in theorder they are in the array.


I think this works so long as:

1) the table names in the query do not appear elsewhere in the queryplan (ex. a table name of "T" would match the first letter of the word"Table" in "Hash Table ResultSet", which we wouldn't want), and

2) the argument array passed to the new function includes *ALL*tables in the query, not just a subset.


With respect to #2, if my query is of the form:

  select ... from
    (select ... from t2, t1, t3 where ...) X1
    (select ... from t1, t2 where ...) X2
  where ...

Assume a test wants to verify that the tables in subquery X2 have a joinorder of { T2, T1 }, but doesn't really care about the join order of thesubquery in X1, nor does it care about the order of X1 w.r.t. X2. You'd*still* have to make sure that the array passed into the ordered searchmethod includes the join order for X1, as well, otherwise the test mightincorrectly pass.

For example, if we only check for the join order of the "targeted"subquery X2, meaning we pass ["T2", "T1"] into the proposed method andignore X1 altogether, then the test would IN-correctly PASS for thefollowing query plan:


  ProjectRestrict:
  +++ JoinNode_0:
  ++++++ LeftResultSet:                  <== This corresponds to X1
  +++++++++ JoinNode_1:
  ++++++++++++ LeftResultSet:
  +++++++++++++++ JoinNode_2:
  ++++++++++++++++++ LeftResultSet: T3
  ++++++++++++++++++ RightResultSet: T2
  ++++++++++++ RightResultSet: T1
  ++++++ RightResultSet:                 <== This corresponds to X2
  +++++++++ JoinNode_3:
  ++++++++++++ LeftResultSet: T1
  ++++++++++++ RightResultSet: T2

If you just search for "T1" followed by "T2", the test will pass becausethe join order for X1 matches--but that's wrong because it's really X2that we wanted to check.

If instead of ["T1", "T2"] you pass in ["T3", "T2", "T1", "T2","T1"]--i.e. include *ALL* tables in the query, even the ones that aren'tnecessarily targeted--then I think you'd get the desired behavior. Thedownside to this is that the test will fail if a join order about whichwe "don't care" changes (ex. the join order for X1 in this case). Butthat's how things work today with the canon-based test, as well, so evenif it's not ideal, at least it wouldn't really be any worse...

To get the ideal behavior (where the test fails if and only if the"targeted" subquery's join order is not what is expected) with theproposed orderedSearchStrings() approach, one would have to ensure thatthe table names used in the targeted subquery do not appear anywhereelse in the query. My guess is that you would have to rewrite a goodnumber of tests to guarantee that, which would probably be non-trivial.

So it seems like the easiest approach would be to follow Kathey'ssuggestion, but make sure that all tests which use the new method passin a full list of all base table names in the query (not just a targetedsubset).


Army

Re: Question regarding runtimestatistics and join order

Reply via email to