Jeffrey Lichtman wrote:
That said, it seems reasonable to think that the optimizer should at
least consider doing a hash join on the subquery, in which case the
join between T2 and T3 could be materialized and then a hash-join
could be done using the predicate x1.j = t1.i.
I think the best thing would be to flatten the subquery into the outer
query so that the optimizer will see it as just another join.
Based on logic in the code, the example query isn't flattenable. In
FromSubquery.preprocess(), there is a check to see if the subquery is flattenable:
/* Return if the FSqry is flattenable()
<snip>
*/
if ((gbl == null || gbl.size() == 0) &&
tableProperties == null &&
subquery.flattenableInFromSubquery(fromList))
{
...
}
"subquery" in the above code is a SelectNode, and the
flattenableInFromSubquery() method of SelectNode has the following:
...
if (this.fromList.size() > 1)
{
return false;
}
...
So the example subquery, which has two tables, isn't flattenable. Note that
even if it _was_ flattenable, the following code in TableOperatorNode.java will
still ultimately call extractSubquery:
leftResultSet = leftResultSet.preprocess(numTables, gbl, fromList);
/* If leftResultSet is a FromSubquery, then we must explicitly extract
* out the subquery (flatten it). (SelectNodes have their own
* method of flattening them.
*/
if (leftResultSet instanceof FromSubquery)
{
leftResultSet = <leftResultSet.extractSubquery(...)>
}
rightResultSet = rightResultSet.preprocess(numTables, gbl, fromList);
/* If rightResultSet is a FromSubquery, then we must explicitly extract
* out the subquery (flatten it). (SelectNodes have their own
* method of flattening them.
*/
if (rightResultSet instanceof FromSubquery)
{
rightResultSet = <rightResultSet.extractSubquery(...)>
}
So either way, we replace the FromSubquery w/ a PRN over a SelectNode, and hence
my earlier question.
Note also that the optimizer _does_ appear to consider this as "just another
join"--it has two optimizables, a FromBaseTable (T1) and a PRN (over the
subquery) and it's trying to find the right join order. What I was wondering is
why the optimizer doesn't ever consider a hash join in this case; the answer is
because the child of the PRN, which is a SelectNode, isn't an optimizable (as
per PRN.isMaterializable()).
That is, of course, just my understanding. Is there a different area that you
can think of where flattening should be occuring? Can you give more info about
where that flattening needs to be done?
BTW, in your example the subquery contains a Cartesian product. Is that
intentional?
Just an example ;)
Army