Jeffrey Lichtman wrote:

Rather than try to get indexing to work on VTIs, I'd consider why the query optimizer isn't using a hash join strategy, which would probably give much better performance than the Cartesian product you're getting now. The optimizer should consider hash join on a VTI if it is the inner table of the join and it is materializable -

<snip>

Similarly, in a comment for DERBY-781, Jeffrey Lichtman wrote:

If you think about it, we already have a join strategy that materializes the
inner result  set, i.e. hash join.  I would expect the optimizer to at least
consider this strategy for the example given in this enhancement request. We
should check whether the optimizer is making the correct decision about hash
join in this case before implementing materialization logic specific to
unions.

I'd like to add a third scenario to the "why isn't the optimizer considering a hash join?" list: namely, SELECT subqueries.

This one, at least, has a definite place in the code where we can see the check being performed. In ProjectRestrictNode.java, there is the following method:

        /** @see Optimizable#isMaterializable
         *
         * @exception StandardException         Thrown on error
         */
        public boolean isMaterializable()
                throws StandardException
        {
                /* RESOLVE - Disallow arbitrary hash joins on
                 * SELECTS within a derived table for now.
                 * Remove this method once that restriction is removed.
                 */
                if (! (childResult instanceof Optimizable))
                {
                        return false;
                }
                return super.isMaterializable();
        }

This method is called from the "feasible()" methods of both NestedLoopJoinStrategy.java and HashJoinStrategy.java. In the latter case, if this method returns false, the optimizer won't ever try to do a hash join (at least, that's how I read the code). So if we have a subquery, in which case childResult will (or least can) be a SelectNode, the fact that SelectNode is NOT an instance of Optimizable means that this method will return false and the optimizer won't ever consider a hash join when the inner table is a subquery. Is that correct?

According the RESOLVE in this method, this was an intentional decision--anyone out there know why?

The reason I ask is because I started playing around with making the optimizer consider hash joins for subqueries, and I have made good progress in that area. Before I go too far, though, I'm wondering what it is that I'm missing--i.e. why was this disabled?

Anyone know?
Army

Reply via email to