Bryan Pendleton commented on DERBY-2130:
----------------------------------------
With jumpReset.patch applied, I cannot reproduce the varying optimize times.
The times are all in a tight range, after nearly 6x as many tests as produced
the variable optimize times before.
That's good to hear. Thank you for taking the time to re-run your tests.
I'm curious: what is the optimization time that you consistently see with
jumpReset.patch? Is it the lesser time (i.e 170-200 seconds) or the greater
time (500-650s)? Or something else entirely?
Given that jumpReset.patch appears to solve the variance problem, we are then
left with the increased optimization time vs 10.1, which as I mentioned earlier
is indirectly caused by the DERBY-1357 fix. And as I proposed earlier, I think
the DERBY-1357 fix is itself correct; the "problem" (I use the term loosely) is
with the following if-block in OptimizerImpl.java:
if (permuteState == JUMPING && !joinPosAdvanced && joinPosition >= 0)
{
//not feeling well in the middle of jump
// Note: we have to make sure we reload the best plans
// as we rewind since they may have been clobbered
// (as part of the current join order) before we gave
// up on jumping.
reloadBestPlan = true;
rewindJoinOrder(); //fall
permuteState = NO_JUMP; //give up
}
While just removing this check reduces the optimization time back to what it was
for 10.1, that in itself is not a complete solution. The reason is that simple
removal of this if-block can in certain cases lead to an infinite loop. As an
example, when I removed the if-block and then tried running lang/innerjoin.sql,
the test never finished; instead it hung due to an infinite "JUMPING" loop. So
some additional changes would be required.
That said, I would like to emphasize that the underlying problem here seems to
be DERBY-1905. Even if we get 10.2 to run repro.sql as 'quickly' as 10.1, it
still takes 10.1 way too long to optimize the query (90 seconds on a "fairly
powerful Windows machine", according to the description for DERBY-2130), esp.
given that there are no rows in any of the tables. The reason is because the
cost estimates are too high (infinity) and thus timeout does not take effect
until too late.
I have been piddling around with DERBY-1905 on and off and early experimentation
shows that if the cost estimates are more reasonable, the repro.sql script
attached to DERBY-2130 completes in 3 or 4 seconds. And that's the case even if
we leave the above if-block exactly as it is right now (i.e. we do *not* remove
it). So this seems to confirm that aside from the variance problem, DERBY-2130
is in some ways an expression of DERBY-1905.
Note, though, that it *may* be too risky to port changes for DERBY-1905 back to
10.2 (I don't know for sure since I don't know what those changes will
ultimately be). Thus it might still be worth it to investigate the
aforementioned if-block angle for the sake of addressing the performance
regression seen between 10.1 and 10.2. I.e. to specifically address the 10.2
slowdown filed as DERBY-2130.
Army