Jeffrey Lichtman wrote:

That said, it seems reasonable to think that the optimizer should at least consider doing a hash join on the subquery, in which case the join between T2 and T3 could be materialized and then a hash-join could be done using the predicate x1.j = t1.i.

I think the best thing would be to flatten the subquery into the outer query so that the optimizer will see it as just another join.

Based on logic in the code, the example query isn't flattenable. In FromSubquery.preprocess(), there is a check to see if the subquery is flattenable:

/* Return if the FSqry is flattenable()
<snip>
*/
if ((gbl == null || gbl.size() == 0) &&
    tableProperties == null &&
    subquery.flattenableInFromSubquery(fromList))
{
    ...
}

"subquery" in the above code is a SelectNode, and the flattenableInFromSubquery() method of SelectNode has the following:

...
if (this.fromList.size() > 1)
{
        return false;
}
...

So the example subquery, which has two tables, isn't flattenable. Note that even if it _was_ flattenable, the following code in TableOperatorNode.java will still ultimately call extractSubquery:

leftResultSet = leftResultSet.preprocess(numTables, gbl, fromList);
/* If leftResultSet is a FromSubquery, then we must explicitly extract
 * out the subquery (flatten it).  (SelectNodes have their own
 * method of flattening them.
 */
if (leftResultSet instanceof FromSubquery)
{
        leftResultSet = <leftResultSet.extractSubquery(...)>
}

rightResultSet = rightResultSet.preprocess(numTables, gbl, fromList);
/* If rightResultSet is a FromSubquery, then we must explicitly extract
 * out the subquery (flatten it).  (SelectNodes have their own
 * method of flattening them.
 */
if (rightResultSet instanceof FromSubquery)
{
        rightResultSet = <rightResultSet.extractSubquery(...)>
}

So either way, we replace the FromSubquery w/ a PRN over a SelectNode, and hence my earlier question.

Note also that the optimizer _does_ appear to consider this as "just another join"--it has two optimizables, a FromBaseTable (T1) and a PRN (over the subquery) and it's trying to find the right join order. What I was wondering is why the optimizer doesn't ever consider a hash join in this case; the answer is because the child of the PRN, which is a SelectNode, isn't an optimizable (as per PRN.isMaterializable()).

That is, of course, just my understanding. Is there a different area that you can think of where flattening should be occuring? Can you give more info about where that flattening needs to be done?

BTW, in your example the subquery contains a Cartesian product. Is that intentional?

Just an example ;)

Army

Reply via email to