I think you've found a bug, but the solution you're trying may not be correct. I believe the optimizer currently keeps track of only two "best plans" - the best access path for each table in a query (or subquery) as it's currently being considered in a join order, and the best overall join order and path for each table for the best plan it's found so far.

Am I right in thinking that the first of these two "best plans" is stored in "bestAccessPath", and the second is stored in "trulyTheBestAccessPath"? That's how I read it, but I'd just like to make sure.

As for the problems you're seeing in view.sql and refActions1.sql, I
suspect it's due to the fact that in some queries there are certain join
orders that won't work (e.g. if you pass a column from one table to the
constructor for a VTI, the VTI must come after the other table in the
join order). Your attempted fix is probably clobbering the original best
plan for the subquery in a case where no subsequent best plan is
possible.

Okay, this definitely makes sense. I went back and looked at my quick-fix and I see that I was a bit overzealous--I reset not only the bestCost for the OptimizerImpl, but also the "foundABestPlan" flag and the "bestJoinOrder" field. The fact that I cleared the foundABestPlan flag was what caused the error I mentioned.

That said, I then changed the code so that it only resets the bestCost field (by setting it to the max value possible); I left foundABestPlan and bestJoinOrder untouched. When I did this, views.sql and refActions1.sql both passed. At first I thought this may have just been dumb luck, but the more I think about it, the more I think this is the correct thing to do. Since, as you said, it's possible that "there is no subsequent best plan" for the subquery, we need to keep hold of the fact that we did, at some point, find a best plan, and we need to know what the corresponding join order is. That said, resetting bestCost to the max value does two things:

1. It ensures that if at least one best plan is found for the current subquery with respect to the current join order of the outer query, that best plan will be saved as the best plan for the subquery and it's cost will be returned accordingly.

2. It ensures that if no best plan is found, then the cost for the subquery will be outrageously high (Double.MAX_VALUE) which means the outer query will not choose its current join order--which is good, because the subquery can't be executed with that outer join order. Further, we will still know that there was at some point some outer join order for which the subquery had a valid best plan (because foundABestPlan will still be set) and we will still know the subquery's join order for that best plan (because bestJoinOrder will still be set).

Number 1 would, I believe, fix the issue that started this thread, because the "1 million" plan would be saved (1 million is less than Double.MAX_VALUE) and then rejected as too expensive. Number 2 would, I believe, address your comment that there may in fact be "no subsequent best plan", which means the subquery must know that it did have a valid plan at one point, and it must know what the join order for that plan was. The only information we lose is the cost of that earlier plan--but the cost should already have been viewed and handled by the outer query at the time it was first returned, so I don't think it's required any more...? The fact that views.sql and refActions1.sql pass with this change suggests that this is along the right lines; I would of course like to have someone confirm if I'm seeing this correctly...

Does that sound right, or am I being too optimistic?

For subqueries it may need to keep track of another "best plan" - the plan to use with the best plan found for the outer query.

I'll have to think about whether it gets more complicated than this if subqueries are nested. Would the optimizer have to keep track of even more levels of "best plan" in cases like this (I hope not).

I've been struggling to wrap my head around this question, as well (at least, I *think* this is the question with which I've been struggling ;) And ultimately, I think this ties back to what I was trying to do with the Phase 1 patch for DERBY-805--namely, keep track of "the plan to use with the best plan found for the outer query". But based on this discussion, I wonder if the patch I posted goes far enough--if there are subqueries beneath subqueries, such as can happen if we have unions beneath unions beneath unions, how do we get each of those subqueries to remember its best plan with respect to _every_ level of outer query (there could be several) that sits above it?

That's the question I've been pondering for the past two days; is that also what you are referring to by the above "I'll have to think about" statement? Or am I taking this in a different direction?

In any event, since it appears that this is in fact an issue with the optimizer, I'll file a Jira entry for tracking purposes.

Thanks for the reply, Jeff.  This was very helpful...
Army

Reply via email to