[
https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565963#action_12565963
]
A B commented on DERBY-2998:
----------------------------
Thank you for updating the wiki page with more details, Thomas. A couple of
comments that come to mind from a quick read of that page (without having
examined the code changes themselves in detail):
1. Near the top of the wiki page there is a sentence saying:
"In this example the ROW_NUMBER function is used to limit the
query as soon as the first N rows have been determined".
But in the details that you've added to the wiki, you note the following:
"For the nested select query above we materialize the subquery
select result, and have the outer SELECT pull rows from the
materialized result".
From a quick reading of your writeup, it seems like materialization
of the subquery might be counter-productive? That is, if I have the
following query:
SELECT * FROM (
SELECT row_number() over () as r, t.* FROM T
) AS tmp WHERE r <= 3;
and table T has a thousand rows in it, the ideal would be to use the
ROW_NUMBER() function to limit the scan on T so that we only fetch
the first 3 rows from disk. But if we materialize the subquery, I
think that means the nested SELECT will issue a full scan on table T,
returning all 1000 rows, and we'll "materialize" those into memory.
Then the ROW_NUMBER() function will simply extract out the first 3
rows from the materialized result set. Is that an accurate description
of what you mean by "materialized result", or am I misreading? If
this is in fact what you are proposing, then can you explain a bit more
about what the benefit of such materialization is? Am I right in
thinking that this could negate potential performance gains that might
otherwise come from ROW_NUMBER() as a "limiting" predicate?
The wiki page does say:
"window function results that span multiple rows, like a moving
average, will benefit from materialization..."
which may be true. But it seems like there may be better ways to
deal with such functions than full materialization of the result set.
In particular, the idea of a "sort observer", similar to what is used
for GroupedAggregateResultSet, seems like it could potentially be
useful for such a function? Which brings me to my next comment...
2. Did you by chance look at GroupedAggregateResultSet and its surrounding
code to see if the OLAP functions might extend that class? It's perhaps
overkill for a ROW_NUMBER() function that only supports empty, unnamed
windows. But the OLAP window specification in general seems to share some
common concepts with GROUP BY processing, so I wonder if it'd be possible
to put building blocks in place based on that similarity--ex. to create
some kind of "WindowFunctionValue" class that extends GroupedAggregated-
ResultSet, and then have the various OLAP functions in turn extend
WindowFunctionValue as needed.
I have no idea if/how that would actually work, but I thought I'd mention
it to see if you've had an opportunity to look at that option? Maybe
it's not worth it for the simple ROW_NUMBER() function that you're currently
working on--feel free to say so :)
3. Under "modification of access paths" there is the following:
"We add one WindowNode for each window function column, and they are
evaluated left to right in the result column list, with the right most
column
being the top WindowNode in the query plan."
So if I have:
SELECT
row_number() over () as r1,
row_number() over () as r2,
t.col1
FROM T
Is it correct to say that my query tree will end up as:
ProjectRestrictNode
==> WindowNode (r2)
==> WindowNode (r1)
==> ProjectRestrictNode
==> FromBaseTable
and my execution tree will look like:
ProjectRestrictResultSet
==> WindowResultSet (r2)
==> WindowResultSet (r1)
==> ProjectRestrictResultSet
==> TableScanResultSet
I guess this isn't really a comment, just a quick check to make
sure I'm understanding what is supposed to happen :)
> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
> Key: DERBY-2998
> URL: https://issues.apache.org/jira/browse/DERBY-2998
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Thomas Nielsen
> Assignee: Thomas Nielsen
> Priority: Minor
> Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff,
> d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat, d2998-6.diff,
> d2998-6.stat, d2998-7.diff, d2998-7.stat, d2998-8.diff, d2998-8.stat,
> d2998-9-derby.log, d2998-9.diff, d2998-9.stat, d2998-doc-1.diff,
> d2998-doc-1.stat, d2998-test.diff, d2998-test.stat, d2998-test2.diff,
> d2998-test2.stat, d2998-test3.diff, d2998-test3.stat
>
>
> As part of implementing the overall OLAP Operations features of SQL
> (DERBY-581), implement the ROW_NUMBER() window function.
> More information about this feature is available at
> http://wiki.apache.org/db-derby/OLAPRowNumber
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.