Kathey Marsden wrote:
The test sets derby.optimizer.noTimeout so it should come up with the lower cost plan each time.

This is probably silly, but have you confirmed that the property is in fact being set correctly? If you output the value of the "noTimeout" field from the OptimizerImpl constructor, does it return "true"?

My question is where in the code should I look for possible
problems with plan selection.

Unfortunately there's no single place where these kinds of problems occur. The underlying cause can be in any number of places and may not be easily discernible...

I can't reproduce in the debugger so will need to just use println's to diagnose the problem.

If you're in a situation where you have to use printlns, you may want to start by printing out the contents of the bestJoinOrder array in OptimizerImpl in two places: 1) at the end of rememberBestCost(), and 2) at the end of modifyAccessPaths(). The former tells you every time the OptimizerImpl thinks it has found a new "best join order" thus far in processing, the latter tells you what the OptimizerImpl's _final_ choice of "best join order" was. If you print out the cost of the join orders (bestCost) as well, and then compare the results from a "fail" plan and a "pass" plan, that might give you an indication of where to start. But as I said, there's no silver bullet in tracking that kind of thing down...

From a very quick glance at the two plans I noticed two things: 1) the cost estimates for the "fail" plan are pretty outrageous--perhaps DERBY-1905 at play here?--which means the values themselves are not really useful for comparison; and 2) the "pass" plan uses a Hash Join while the "fail" plan uses a Nested Loop join. With respect to #2, the optimizer will deliberately skip a hash join if it does not think the JVM has enough memory to handle all of the rows. So is it possible that in the runs where the query fails the JVM has less available memory?

In a related note, I think the comments before the query in question say:

        // In this query the optimizer will consider pushing, but
        // will find that it's cheaper to do a hash join and thus
        // will _not_ push. So we see hash join with table scan on T3.

The "fail" query plan shows that the predicate _is_ actually being pushed to T1 and T3 (it ends up as a qualifier on T1 and as an index key for T3). Given the ridiculously high cost estimates of doing so, it seems like the optimizer would only have chosen to push the predicates as a last resort--i.e. if the hash join was not possible due to memory constraints. At least that's what I hope based on the plans you provided. It's also possible that for whatever reason the optimizer's estimates for the "fail" run are somehow skewed so that the nested loop join is actually viewed as "cheaper". If that happens then you're probably looking at DERBY-1905...

To see if memory is an issue for the hash join, maybe search for "memoryUsageOK" in OptimizerImpl and print out the results of those calls...?

Army

Reply via email to