Thomas Nielsen wrote:
As part of my work on DERBY-2998, the implementation of the ROW_NUMBER()
window function, I'm having trouble with the optimizer and its (lack of)
pushing predicates down the query tree.
[ snip ]
To add to the info I see the same kind of queryplan in the similar query
where the aliased expression is exchanged with 'a*b':
SELECT * FROM (SELECT a*b AS R, T.* FROM T) AS TR WHERE R < 3;
So I'm starting to wonder if this is a general Derby problem, where
restrictions on aliased expression columns are not properly optimized?
I think this is the key, yes. When I ran the above query and did some
tracing, Derby correctly "categorized" the predicate as "pushable" (see
Predicate.categorize(), and especially the "categorize()" method of
ColumnReference) and then tried to push the predicate down into the
inner SELECT query. In ProjectRestrictNode.pushExpressions() we see:
/* If this is a PRN above a SelectNode, probably due to a
* view or derived table which couldn't be flattened, then see
* if we can push any of the predicates which just got pushed
* down to our level into the SelectNode.
*/
if (pushPList != null && (childResult instanceof SelectNode))
{
pushPList.pushExpressionsIntoSelect(
(SelectNode) childResult, false);
}
which brings us to PredicateList.pushExpressionIntoSelect(...), where we
find the following:
ColumnReference ref = (ColumnReference)e.nextElement();
if (!ref.pointsToColumnReference())
{
state = false;
break;
}
That is to say, if the ColumnReference's source does not in turn point
to another column reference, then "state" will become false and the
predicate will *not* be pushed. For the example of "a*b as R", the
column reference "R" points to a BinaryArithmeticOperatorNode (for the
expression "a*b"), which is not a ColumnReference. Thus the predicate
is *not* pushed down into the SELECT.
I'm guessing the same thing happens for your ROW_NUMBER() query. Since
ROW_NUMBER() is not a ColumnReference, the predicate is not pushed down
into the SELECT, which means it will be enforced by the
ProjectRestrictNode which sits above the SelectNode. Hence the
following at the top of your query plan:
******* Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 5
Rows filtered = 3
*restriction = true* <== Predicate applied at the level of the PRN
So I think that explains the behavior you're seeing. As to whether or
not this "incorrect", I'm not sure. Seems more likely that this is a
case where support for arbitrary expressions under the ColumnReference
was perhaps too complex (ex. subqueries, method calls, etc.) and so no
support for that was added.
You could perhaps change "ref.pointsToColumnReference()" in the above
code to account for OLAP-related functions like ROW_NUMBER(), but I do
not know off-hand if that would/could lead to problems down the road.
Might be worth a shot, though, to see what happens...
Does that help?
Army