Jeffrey Lichtman wrote:
Rather than try to get indexing to work on VTIs, I'd consider why the
query optimizer isn't using a hash join strategy, which would probably
give much better performance than the Cartesian product you're getting
now. The optimizer should consider hash join on a VTI if it is the inner
table of the join and it is materializable -
<snip>
Similarly, in a comment for DERBY-781, Jeffrey Lichtman wrote:
If you think about it, we already have a join strategy that materializes the
inner result set, i.e. hash join. I would expect the optimizer to at least
consider this strategy for the example given in this enhancement request. We
should check whether the optimizer is making the correct decision about hash
join in this case before implementing materialization logic specific to
unions.
I'd like to add a third scenario to the "why isn't the optimizer considering a
hash join?" list: namely, SELECT subqueries.
This one, at least, has a definite place in the code where we can see the check
being performed. In ProjectRestrictNode.java, there is the following method:
/** @see Optimizable#isMaterializable
*
* @exception StandardException Thrown on error
*/
public boolean isMaterializable()
throws StandardException
{
/* RESOLVE - Disallow arbitrary hash joins on
* SELECTS within a derived table for now.
* Remove this method once that restriction is removed.
*/
if (! (childResult instanceof Optimizable))
{
return false;
}
return super.isMaterializable();
}
This method is called from the "feasible()" methods of both
NestedLoopJoinStrategy.java and HashJoinStrategy.java. In the latter case, if
this method returns false, the optimizer won't ever try to do a hash join (at
least, that's how I read the code). So if we have a subquery, in which case
childResult will (or least can) be a SelectNode, the fact that SelectNode is NOT
an instance of Optimizable means that this method will return false and the
optimizer won't ever consider a hash join when the inner table is a subquery.
Is that correct?
According the RESOLVE in this method, this was an intentional decision--anyone
out there know why?
The reason I ask is because I started playing around with making the optimizer
consider hash joins for subqueries, and I have made good progress in that area.
Before I go too far, though, I'm wondering what it is that I'm missing--i.e.
why was this disabled?
Anyone know?
Army