GitHub user sirpkt opened a pull request:
https://github.com/apache/tajo/pull/454
TAJO-1415: Window frame support
It supports all ROWS and RANGE window frame.
Cases when window frame is not applied
- no order by clause is used
- some built-in window functions where window frame is not supported:
row_number, rank, dense_rank, percent_rank, cume_dist, tile, lag, lead
Cases when window frame should be applied
- other built-in window functions: first_value, last_value, nth_value
- normal aggregation functions
Based on above information, this patch distinguishes window function types
as following three:
1. built-in window function without window frame support
2. built-in window function with window frame support
3. normal aggregation functions used as a window function. In this case,
window frame should be supported
And, it further distinguishes window frame types as following four:
1. entire partition
2. from the start of the partition to the moving end point relative to
current row
3. from the moving start point relative to current row to the end of the
partition
4. sliding frame as the current row position varies
Case 1 is the same as previous handling of window function.
Case 2 is handled as incremental termination of aggregation function, which
means for every row call merge() and terminate() of the given function
Case 3 is handled almost the same as case 2 except feeding rows to the
function from the end of the partition to the start of the frame, i.e., in
reverse order
Case 4 is handled by two pass approach: making small loop of feeding rows
to the function for each row value computation, I think, which is inevitable
since aggregation function does not support sliding window aggregation.
All above are implemented for ROWS first,
and then expanded to support RANGE by including rows that has the same
order by value with current row in computation of window function.
This patch includes following changes
- parser can handle integer offset PRECEDING and FOLLOWING
- ExprAnnotator can reflect window frame information on WindowFunctionEval
including default value handling
- WindowAggExec can handles ROWS and RANGE with window frame support
- Parameter checking in parser and ExprAnnotator is included
- last_value is re-implemented as WindowAggFunc. First_value implementation
becomes more simple
- Window related classes in tajo-plan has new prefix 'Logical' to
distinguish themselves with the same name class in tajo-algebra
- plan.proto is modified to support data structure to distinguish function
types and frame types
- add test cases for window frame
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sirpkt/tajo TAJO-1415
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/454.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #454
----
commit 3e2cfabb8513ae730cdef7c42347a5730df47988
Author: Keuntae Park <[email protected]>
Date: 2015-03-22T13:38:30Z
window frame ROWS support is added, RANGE is not supported yet
commit ddd6797d3c029ec984e0e3f99eb68755ee05261f
Author: Keuntae Park <[email protected]>
Date: 2015-03-22T13:39:56Z
Merge remote-tracking branch 'upstream/master' into TAJO-1415
commit ddc7c1b2d1a2c8ad8cc5bee8f8b6141a13116973
Author: Keuntae Park <[email protected]>
Date: 2015-03-23T00:39:09Z
bug fix during master merge
commit aa97dbba3b055cc598667c5c167607b62cf64de3
Author: Keuntae Park <[email protected]>
Date: 2015-03-23T06:56:56Z
support for RANGE window frame
commit 7b21415dfc2e67508d4cca192aa69e6f3bede68d
Author: Keuntae Park <[email protected]>
Date: 2015-03-23T06:57:11Z
Merge remote-tracking branch 'upstream/master' into TAJO-1415
commit 973d99fd33f819387dc33f29b775f02ede860198
Author: Keuntae Park <[email protected]>
Date: 2015-03-23T07:56:34Z
Fix bug for no order by case, where window function SHOULD work on the
entire partition
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---