Jeffrey Lichtman wrote:

Based on logic in the code, the example query isn't flattenable. . .

That's because whoever wrote the code made it handle only the simplest case. I doubt it would be hard to make it flatten many other types of table subqueries.

The example I gave was a simplified scenario to show how a PRN can end up with a SelectNode beneath it--which was (I believe?) the example requested by Satheesh. The actual query that prompted this question, though, has a subquery that uses aggregates and a GROUP BY--i.e. the subquery *cannot*, as I understand it, be flattened into the outer query, because the aggregate/group-by functionality has to be performed before evaluation of the outer query can occur. Ex.

select t1.i, x1.s1 from t1 inner join (select distinct j, sum(b) s1 from t2 group by j) x1 on x1.j = t1.i;

In a case like this, where the subquery _can't_ be flattened, it still seems to me that a hash join could be beneficial--but because of the logic in ProjectRestrictNode.isMaterializable(), the hash join isn't allowed. So what I'm wondering is _why_ is that logic there? That is, when a subquery cannot be flattened into an outer query, the optimizer always considers a hash join to be infeasible. Why is that?

If the answer is simply that "no one has looked at removing this restriction yet", then that's fine--that's what I want to know. If, however, there is a deliberate reason for leaving this restriction in place, I was hoping someone out there knew what that reason was. The comments in PRN.isMaterializble() seem to suggest this wasn't meant to be a permanent restriction, so my guess is that "no one has done it yet" is the correct answer.

My general philosophy toward query performance issues is that I prefer massaging the query into a standard form and letting the optimizer handle it to putting in special-case logic for certain types of queries.

I agree, avoiding special-case is good. Which is why the special-case logic for PRN's over non-optimizable child nodes in isMaterializable() seems odd to me, and hence my question.

The optimizer can do things the rest of query processing would have a difficult time with. For example, if an inner join in a subquery is flattened into the outer query, the optimizer is free to put the tables from the subquery anywhere in the join order, even if it means interspersing the subquery's tables with the outer query's tables.

This is, as you say, a good reason to look at increasing Derby's ability to flatten subqueries. But in cases where subqueries simply cannot be flattened, I think the restriction in PRN.isMaterializable() is still going to be a cause for sub-optimal performance, because it disallows hash joins where they could potentially be useful.

Thanks for your patience with my questions on this topic; I'm just trying to get a grasp on how this all is supposed to work...

Army

Reply via email to