[
https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573867#action_12573867
]
A B commented on DERBY-2998:
----------------------------
Thanks for patch 19, Thomas. I think your approach to handling the virtual
column nodes is a good one (certainly better than the alternatives seen thus
far). In the interest of getting something into the codeline for others to
play with--and to simplify future patches--per Dan's comment a few days back,
I'll try to commit patch 19 sometime today.
Once that is committed, perhaps you can post one or more follow-up patches to
address the following un-related comments, which I noticed while looking into
the VCN issue. Of these # 3 is probably the most important one...
1) Instead of using "row_number()" for the default name of a ROW_NUMBER()
column, I think it'd be better to use a normal generated column name, as is
done for all other un-named expressions in Derby (I believe?). So something
like:
select i, j, i+j, row_number() over() from t1
should return four columns, "I", "J", "3", and "4", where the last two are
generic generated names used for unnamed expressions. With patch 19 such a
query currently returns "I", "J", "3", "row_number()", which deviates from the
existing pattern.
2) Regarding the following logic in PredicateList.pushExpressionsIntoSelect():
ColumnReference ref = (ColumnReference)e.nextElement();
if (!ref.pointsToColumnReference() &&
!ref.pointsToWindowFunction())
{
state = false;
break;
}
I think it says "if the column reference points to something that is neither a)
another column reference NOR b) a window function, then do *not* push it down."
So if the column reference points to a window function, we we could still
potentially push it down. Is that what we want? For the record, I commented
the call to "pointsToWindowFunction()" out completely and all of the queries
still ran the same. So is this check still necessary?
3) The following query returns one row, regardless of how many rows there are
in T1:
select distinct row_number() over(), 'HMM' from t1
It appears that we will evaluate the entire SELECT result set, then apply the
DISTINCT, and finally, assign the row numbers. So if there are five rows in
T1, we'll get five rows having a single "HMM" column, then we'll apply the
distinct to get a single "HMM" row, and that row will have row number "1". But
is that the correct behavior? Or is the row_number() supposed to be included
in the "DISTINCT" qualification, in which case the query would return 5 rows
because each row has a different row number and thus each is distinct from the
others?
Just to see what would happen I ran the above query on DB2 v8 and it returned
one row for each row in T1. I took a look at the syntax rules for <window
function> in SQL 2003 spec section 6.10 but could not, in my inexperience with
reading specs, determine one way or the other. That said, though, I think
functions in a ResultColumnList are typically evaluated once per row, *prior*
to the DISTINCT being applied, so I would guess ROW_NUMBER() should do the same?
4) Might be good to add some test cases for GROUP BY and HAVING clauses in the
presence of ROW_NUMBER(). I tried a few quick ones and they seem to be working
fine, but it wouldn't hurt to add some tests for posterity. These are the ones
I tried, but use your imagination :)
select r from (select i, row_number() over() as r, j from t1) x group by r;
select * from (select i, row_number() over() as r, j from t1) x group by i,
j, r;
select * from (select i, row_number() over() as r, j from t1) x group by j,
r, i;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2 and i >=3
order by i desc;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2 and i >=3
order by r desc;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2 and i >=3
order by i asc, r desc;
And yes I realize that grouping by a ROW_NUMBER() is kind of silly since row
numbers are unique across rows--but it's good make sure the behavior is correct
:)
5) SQL spec indicates that window functions like ROW_NUMBER() can be used in an
ORDER BY clause, but that is not yet implemented (right?). Do you think it
would be worth it to add a statement saying as much to the documentation?
> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
> Key: DERBY-2998
> URL: https://issues.apache.org/jira/browse/DERBY-2998
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Thomas Nielsen
> Assignee: Thomas Nielsen
> Priority: Minor
> Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff,
> d2998-12.diff, d2998-12.stat, d2998-13.diff, d2998-13.stat, d2998-14.diff,
> d2998-14.stat, d2998-15.diff, d2998-15.stat, d2998-16.diff, d2998-16.stat,
> d2998-17.diff, d2998-17.stat, d2998-18.diff, d2998-18.stat, d2998-19.diff,
> d2998-19.stat, d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat,
> d2998-6.diff, d2998-6.stat, d2998-7.diff, d2998-7.stat, d2998-8.diff,
> d2998-8.stat, d2998-9-derby.log, d2998-9.diff, d2998-9.stat,
> d2998-doc-1.diff, d2998-doc-1.stat, d2998-test.diff, d2998-test.stat,
> d2998-test2.diff, d2998-test2.stat, d2998-test3.diff, d2998-test3.stat,
> d2998-test4.diff, d2998-test4.stat, d2998-test6.diff, d2998-test7.diff,
> d2998-test8.diff, d2998-test9.diff
>
>
> As part of implementing the overall OLAP Operations features of SQL
> (DERBY-581), implement the ROW_NUMBER() window function.
> More information about this feature is available at
> http://wiki.apache.org/db-derby/OLAPRowNumber
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.