I’ve tried to follow the thread here to get back to my original question.  Thanks for all the insightful discussion!

 

I suppose what I’ve learned is that the optimizer will only optimize VTIs if they can’t be instantiated multiple times.  Is this true?  If so, how/when is the hashtable created?  Is it created after one complete scan based on the query pattern?

 


From: Satheesh Bandaram [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 17, 2006 7:25 PM
To: [email protected]
Subject: Re: Optimizing subqueries [ Was: Re: VTI, Indexed Lookup and the Query Optimizer ]

 

Hi Jeff,

Jeffrey Lichtman wrote:

Unfortunately, I don't remember why hash joins are prohibited in this case. One thing I notice in looking at the code is the following in HashJoinStrategy.feasible():

Did you forget to cut_and_paste something from HashJoinStrategy.feasible()? You said you noticed something in the code, but was empty after that.

Anyway, I wonder how SelectNode could correctly estimate size of hashtable at runtime? Derby/Cloudscape didn't have a HashTable backing store initially, which made optimizer be very conservative in estimating size of potential hashtable at runtime. While optimizer maintains potential number of rows that might qualify, if this number is OFF at runtime, the query could fail. For FromBaseTable, using statistics, it is possible to estimate size of hash table more accurately. (Derby now has hashtable backing store.)

It seems to me ProjectRestrictNode has another way to perform materialization, through modifyAccessPath() considering materialization as one of the possible plans.
 
        /* We consider materialization into a temp table as a last step.
         * Currently, we only materialize VTIs that are inner tables
         * and can't be instantiated multiple times.  In the future we
         * will consider materialization as a cost based option
.
         */
        return (Optimizable) considerMaterialization(outerTables);

Like the comment says, this materialization is currently only done for VTIs that can't be instantiated multiple times. I wonder if Derby should consider implementing this materialization for all Select-subqueries as a cost based decision. While the HashJoin way of materialization is great where possible, it has limitations. (like need to have a equijoin) This materialization could be more generic form that may cover other cases too, as a cost based decision. This is what I filed under DERBY-781.

Satheesh

The only thing I can suggest is to try removing the restriction and see what happens. I'll try to help if you run into problems.

Reply via email to