Jeffrey Lichtman wrote:
Based on logic in the code, the example query isn't flattenable. . .
That's because whoever wrote the code made it handle only the simplest
case. I doubt it would be hard to make it flatten many other types of
table subqueries.
The example I gave was a simplified scenario to show how a PRN can end up with a
SelectNode beneath it--which was (I believe?) the example requested by Satheesh.
The actual query that prompted this question, though, has a subquery that uses
aggregates and a GROUP BY--i.e. the subquery *cannot*, as I understand it, be
flattened into the outer query, because the aggregate/group-by functionality has
to be performed before evaluation of the outer query can occur. Ex.
select t1.i, x1.s1 from t1 inner join (select distinct j, sum(b) s1 from t2
group by j) x1 on x1.j = t1.i;
In a case like this, where the subquery _can't_ be flattened, it still seems to
me that a hash join could be beneficial--but because of the logic in
ProjectRestrictNode.isMaterializable(), the hash join isn't allowed. So what
I'm wondering is _why_ is that logic there? That is, when a subquery cannot be
flattened into an outer query, the optimizer always considers a hash join to be
infeasible. Why is that?
If the answer is simply that "no one has looked at removing this restriction
yet", then that's fine--that's what I want to know. If, however, there is a
deliberate reason for leaving this restriction in place, I was hoping someone
out there knew what that reason was. The comments in PRN.isMaterializble() seem
to suggest this wasn't meant to be a permanent restriction, so my guess is that
"no one has done it yet" is the correct answer.
My general philosophy toward query performance issues is that I prefer
massaging the query into a standard form and letting the optimizer
handle it to putting in special-case logic for certain types of queries.
I agree, avoiding special-case is good. Which is why the special-case logic for
PRN's over non-optimizable child nodes in isMaterializable() seems odd to me,
and hence my question.
The optimizer can do things the rest of query processing would have a
difficult time with. For example, if an inner join in a subquery is
flattened into the outer query, the optimizer is free to put the tables
from the subquery anywhere in the join order, even if it means
interspersing the subquery's tables with the outer query's tables.
This is, as you say, a good reason to look at increasing Derby's ability to
flatten subqueries. But in cases where subqueries simply cannot be flattened, I
think the restriction in PRN.isMaterializable() is still going to be a cause for
sub-optimal performance, because it disallows hash joins where they could
potentially be useful.
Thanks for your patience with my questions on this topic; I'm just trying to get
a grasp on how this all is supposed to work...
Army
- Re: Optimizing subqueries [ Was: Re: VTI, Indexed Lookup... Army
-