[
https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572260#action_12572260
]
A B commented on DERBY-2998:
----------------------------
Hi Thomas, thanks for your continued work on this.
> Sorry about that :|
Not a problem :)
I haven't had a chance to look at the failing queries that you refer to in
test7, but I did take a look at patch 15. Some comments/questions after my
first run through are below.
1) There are a lot of places with logic similar to the following:
ResultColumn rc = (ResultColumn) resultColumns.elementAt(index);
if (rc.getExpression() instanceof WindowFunctionColumnNode)
{
...
}
Since WindowFunctionColumnNode extends ResultColumn, I wonder if it would be
worth it to add a new "isWindowColumn()" method (or something like that) to
ResultColumn.java and let that method contain all of the necessary logic for
finding a WindowFunctionColumnNode. If that was done then the above type of
"if" expression becomes simply:
ResultColumn rc = (ResultColumn) resultColumns.elementAt(index);
if (rc.isWindowColumn())
{
...
}
which makes it easier to extend the check for "window columns" in the future if
needed. For example, would it ever be necessary to walk further down the
ResultColumn's expression chain to look for a RowNumber column? That is, could
we ever have:
ResultColumn
-> VirtualColumnNode
-> ResultColumn
-> ...
-> RowNumberColumnNode
If so, then the "isWindowColumn()" as defined in ResultColumn.java could
include the necessary logic to walk the chain, which would simplify the calling
code. See, for example, comments #5 and #6 below.
2)
In WindowNode.bind() there is the following:
/* At this stage there is hopefully a PRN below us as the fromList */
return this;
It might be good to add an ASSERT here to make sure that the 'hopefully'
condition is satisified...
3)
Many of the fields in WindowNode (and other new classes added by this patch)
are "public", but some of them are never referenced outside of WindowNode.
Unless a "public" declaration is functionally necessary, it seems like it would
be safer to make them private and then supply the necessary getter/setter
methods where needed, to avoid accidental modification.
4)
In WindowNode.java there is a field "level" which is initialized to -1 and then
is set from SelectNode.java. But as far as I can tell there are no comments
explaining what that field does. I think I was able to figure it out from the
code, but it'd be nice to have explanatory comments within WindowNode itself.
And per comment #3 above, I think it'd be nice to use a setter method in
SelectNode instead of direct assignment to the public field.
5)
In ColumnReference.java there are two different kinds of checks to see if the
ColumnReference is pointing to a WindowColumnNode. The first is via the method
"pointsToWindowFunction()", which does an instanceof check on the source
ResultColumn's expression and returns true if it is a WindowFunctionColumnNode.
The second is in the categorize() method, where the logic only evaluates to
true if the source ResultColumn's expression is a *VirtualColumnNode* whose
source ResultColumn's expression is a WindowFunctionColumnNode. So the checks
do not appear to be equivalent. Is that intentional? If so, then explanatory
comments might be good here. Also, I believe this is a good example of the
kind of code that might benefit from an "isWindowColumn()" function on
ResultColumn, per comment #1 above.
6)
In ResultColumnList.java, there is the following:
/* Window function columns are added by the WindowResultSet */
if (sourceExpr instanceof WindowFunctionColumnNode ||
(sourceExpr instanceof VirtualColumnNode &&
(sourceExpr.getSourceResultColumn().getExpression()
instanceof WindowFunctionCumnNode)))
{
continue;
}
See comment #1 above for an alternate (cleaner?) approach...
7)
The result set statistics node for WindowResultSet does not currently print out
the name of the result set...(I don't think). This is pretty minor, but I was
briefly confused when I saw the RealWindowResultSetStatistics class and yet the
query plans weren't showing anything "window" releated.
8)
If I'm understanding correctly, the execution-time logic for WindowResultSet
operates in a fashion similar to ProjectRestrictResultSet. That is, it
iterates through all of its source's rows and, for each one, it applies a
restriction. If the row fails the restriction, then the iteration moves on to
the next source row, and so on. This agrees with the following comments,
pulled from the getNextRowCore() method of WindowResultSet:
/*
* Loop until we get a row from the source that qualifies, or there are
* no more rows to qualify. For each iteration fetch a row from the
* source, and evaluate against the restriction if any.
*/
If this is correct, then I'm not entirely sure we'll get the performance boost
that one might perhaps expect from a query like:
SELECT * FROM (
SELECT row_number() over () as r, t.* FROM T
) AS tmp WHERE r <= 3;
I made a similar comment on Februrary 5th that was based on materialization of
the source result set, and that comment ended up being wrong because I
misunderstood materialization. But in looking at this query again, I think
something might still be slightly sub-optimal here.
If we take the approach of "loop until we get a row that qualifies", then we
will loop through _all_ of the rows in table T, even though only the first 3
satisfy the restriction. In the end we will correctly return the first 3 rows
(and only the first 3 rows) but not before we've read all of the others from
disk and applied the "<= 3" restriction to each one. This can be observed by
either a) adding debug printouts to WindowResultSet.getNextRowCore() for every
call to "source.getNextRowCore()", or b) looking at the query plan. For the
above query, the plan shows the following for the top-most
ProjectRestrictResultSet:
******* Project-Restrict ResultSet (1):
Number of opens = 1
Rows seen = 1280
Rows filtered = 1277
restriction = true
So we actually read all 1280 rows from disk, then filtered 1277 of them out so
that, in the end, we only returned 3 rows. Thus from a performance perspective
the ROW_NUMBER() function does not appear to improve things. Is that a known
limitation of the current development (which, for the record, would be
perfectly acceptable)?
Having said that, there is absolutely _NO_ need for you to boost performance
before committing the changes :) Incremental development is good and it's
great to have the functionality working. Performance improvements can always
be addressed later, if you (or anyone else) so chooses. I'm just raising the
issue to make sure that I understand how things are working.
9)
For the query mentioned in comment #8 above, the query plan shows:
ProjectRestrictResultSet (sees ALL rows, restricts down to 3)
-> ProjectRestrictResultSet (sees ALL rows)
-> WindowResultSet (sees ALL rows)
I intuitively expected the restriction to appear at the level of the
WindowResultSet, as opposed to appearing at the level of the top-most PRN. The
plan as shown above makes it look like the restriction is not being enforced by
the Window ResultSet, but is instead being enforced by the ProjectRestrict. Is
that correct? Or is this just a case where the query plan fails to capture
what is really happening with the WindowResultSet? I'm guessing the latter,
but am not sure...
> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
> Key: DERBY-2998
> URL: https://issues.apache.org/jira/browse/DERBY-2998
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Thomas Nielsen
> Assignee: Thomas Nielsen
> Priority: Minor
> Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff,
> d2998-12.diff, d2998-12.stat, d2998-13.diff, d2998-13.stat, d2998-14.diff,
> d2998-14.stat, d2998-15.diff, d2998-15.stat, d2998-4.diff, d2998-4.stat,
> d2998-5.diff, d2998-5.stat, d2998-6.diff, d2998-6.stat, d2998-7.diff,
> d2998-7.stat, d2998-8.diff, d2998-8.stat, d2998-9-derby.log, d2998-9.diff,
> d2998-9.stat, d2998-doc-1.diff, d2998-doc-1.stat, d2998-test.diff,
> d2998-test.stat, d2998-test2.diff, d2998-test2.stat, d2998-test3.diff,
> d2998-test3.stat, d2998-test4.diff, d2998-test4.stat, d2998-test6.diff,
> d2998-test7.diff
>
>
> As part of implementing the overall OLAP Operations features of SQL
> (DERBY-581), implement the ROW_NUMBER() window function.
> More information about this feature is available at
> http://wiki.apache.org/db-derby/OLAPRowNumber
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.