Army,
Thanks for taking the time to have a look at this.
To add to the info I see the same kind of queryplan in the similar
query where the aliased expression is exchanged with 'a*b':
SELECT * FROM (SELECT a*b AS R, T.* FROM T) AS TR WHERE R < 3;
So I'm starting to wonder if this is a general Derby problem, where
restrictions on aliased expression columns are not properly optimized?
I think this is the key, yes. When I ran the above query and did some
tracing, Derby correctly "categorized" the predicate as "pushable"
(see Predicate.categorize(), and especially the "categorize()" method
of ColumnReference) and then tried to push the predicate down into the
inner SELECT query. In ProjectRestrictNode.pushExpressions() we see:
/* If this is a PRN above a SelectNode, probably due to a
* view or derived table which couldn't be flattened, then see
* if we can push any of the predicates which just got pushed
* down to our level into the SelectNode.
*/
if (pushPList != null && (childResult instanceof SelectNode))
{
pushPList.pushExpressionsIntoSelect(
(SelectNode) childResult, false);
}
which brings us to PredicateList.pushExpressionIntoSelect(...), where
we find the following:
ColumnReference ref = (ColumnReference)e.nextElement();
if (!ref.pointsToColumnReference())
{
state = false;
break;
}
That is to say, if the ColumnReference's source does not in turn point
to another column reference, then "state" will become false and the
predicate will *not* be pushed. For the example of "a*b as R", the
column reference "R" points to a BinaryArithmeticOperatorNode (for the
expression "a*b"), which is not a ColumnReference. Thus the predicate
is *not* pushed down into the SELECT.
If you look at my comments in DERBY-2998, I acutally got this far a
while ago. I even added a if(! ref.pointsToOLAPFunction()) in the above
if statement to hadle the ROW_NUMBER() function. But it had some
sideeffects (see below).
Neat - seems I was on the right track after all :D
I'm guessing the same thing happens for your ROW_NUMBER() query.
Correct - see above. My newly introduced RowNumberColumnNode is a
ResultColumn (which is again a ValueNode), so no ColumnReferences here.
Since ROW_NUMBER() is not a ColumnReference, the predicate is not
pushed down into the SELECT, which means it will be enforced by the
ProjectRestrictNode which sits above the SelectNode. Hence the
following at the top of your query plan:
******* Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 5
Rows filtered = 3
*restriction = true* <== Predicate applied at the level of the PRN
So I think that explains the behavior you're seeing. As to whether or
not this "incorrect", I'm not sure. Seems more likely that this is a
case where support for arbitrary expressions under the ColumnReference
was perhaps too complex (ex. subqueries, method calls, etc.) and so no
support for that was added.
Sounds reasonable.
You could perhaps change "ref.pointsToColumnReference()" in the above
code to account for OLAP-related functions like ROW_NUMBER(), but I do
not know off-hand if that would/could lead to problems down the road.
Might be worth a shot, though, to see what happens...
One of the results of adding the extra if (!ref.pointsToOLAPFunction())
was that certain very simple joins started failing. That actually lead
me to believe I was headed down the very wrong track. It would seem like
that assumption was wrong.
I should probably reenable the push, and start tracking down why the
join fails.
Does that help?
This helps a lot! Thanks again Army!
Cheers,
Thomas