Satheesh Bandaram wrote:
Just remembered, during discussions with Jeffl, he mentioned another one... Improve unnesting of subqueries with more than one table. More work. :-)

<snip list of Optimizer To-Do's>

One other thing to add to this list was mentioned a while ago on derby-dev here:

http://mail-archives.apache.org/mod_mbox/db-derby-dev/200412.mbox/[EMAIL 
PROTECTED]

In that email Jack talked about his BackingStoreHashtable changes and the need to update Optimizer costing to account for those changes (which were committed as part of 10.1). In that email Jack says:

> I would like to work on this, changing BackingStoreHashtable to spill to
> disk when the hash table gets large, and changing the optimizer to
> understand that large hash tables are allowed, but more costly.

The changes to spill to disk were implemented for 10.1, but I don't think the optimizer was ever updated accordingly. In particular, it still checks the memory requirements for the hash join and, if it's too high, the Optimizer will skip it. I think the code that does this in the OptimizerImpl.considerCost() method:

/*
** Skip this access path if it takes too much memory.
**
** NOTE: The default assumption here is that the number of rows in
** a single scan is the total number of rows divided by the number
** of outer rows.  The optimizable may over-ride this assumption.
**
** NOTE: This is probably not necessary here, because we should
** get here only for nested loop joins, which don't use memory.
*/
if(!optimizable.memoryUsageOK( estimatedCost.rowCount() / outerCost.rowCount(),
    maxMemoryPerTable))
{
        if (optimizerTrace)
        {
                trace(SKIPPING_DUE_TO_EXCESS_MEMORY, 0, 0, 0.0, null);
        }
        return;
}

Note that the comment regarding "this is probably not necessary" appears to be out of date--there are situations where we can and do get to this code when considering the cost of hash joins.

So as another "To-Do", I think we need to somehow update the optimizer to account for the fact that if the hash join requires a lot of memory and thus will (probably?) spill over, the cost should be adjusted accordingly--instead of just skipping the plan altogether.

Army

Reply via email to