[
https://issues.apache.org/jira/browse/TAJO-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368284#comment-14368284
]
Keuntae Park commented on TAJO-1415:
------------------------------------
I think the rough flow of resolving this issue as follows:
1) distinguish window functions between window frame appliable and
non-appliable, and override window frame setting for non-appliable functions as
"RANGE BETWEEN UNBOUND PROCEEDING AND UNBOUNDED FOLLOWING" because it means the
entire window partition.
This is already done for "row_number" in ExprAnnotator, so do the same to other
remaining non-appliable functions.
2) Enhance WindowAggExec to support window frame
I try to handle window frame by partitioning to four cases:
1) frame_start is UNBOUNDED PROCEEDING and frame_end is UNBOUNDED FOLLOWING
Do the same as current implementation
2) frame_start is UNBOUNDED PROCEEDING and frame_end is relative position to
CURRENT ROW
Do the same as current WindowFunc handling, however, data feeding to function
should be shifted, which means n th row in window partition is fed as n+m th
row to the function where m is relative position to CURRENT ROW.
3) frame_start is relative position to CURRENT ROW and frame_end is relative
position to CURRENT ROW
In this case, aggregation function handling is important. I think there can be
two ways to handle this case
i) add small iteration of data feeding to aggregation function, whose size
will be the size of window frame
ii) modify aggregation function to support sliding window mode
I think i) is better than ii) because ii) needs modification of all the
aggregation functions however i) has non-negligible overhead when the size of
window frame becomes bigger.
4) frame_start is relative position to CURRENT ROW and frame_end is UNBOUNDED
FOLLOWING
I think it needs feeding data to the function in the reverse row order and
special handling for first_value, last_value, and nth_value.
Welcome any comments.
> Window frame support
> --------------------
>
> Key: TAJO-1415
> URL: https://issues.apache.org/jira/browse/TAJO-1415
> Project: Tajo
> Issue Type: Sub-task
> Components: distributed query plan, parser, physical operator,
> planner/optimizer
> Reporter: Keuntae Park
> Fix For: window function
>
>
> We can define frame clause in window definition like
> {code}
> [ RANGE | ROWS ] frame_start
> [ RANGE | ROWS ] BETWEEN frame_start AND frame_end
> {code}
> , where frame_start and frame_end can be one of
> {code}
> UNBOUNDED PRECEDING
> value PRECEDING
> CURRENT ROW
> value FOLLOWING
> UNBOUNDED FOLLOWING
> {code}
> According to the window functions description of
> PostgreSQL(http://www.postgresql.org/docs/9.4/static/functions-window.html),
> there are two types of window functions based on window frame support.
> 1) row_number, rank, dense_rank, percent_rank, cume_dist, tile, lag and lead:
> these functions only work within window partition, which means window frame
> has no effect on these functions.
> 2) first_value, last_value, nth_value, and aggregation function as as window
> function: these functions should work with rows within window frame.
> Currently, Tajo parser recognize the window frame grammar but windowAggExec
> does not use that information.
> It works as if window frame is set as "RANGE BETWEEN UNBOUND PROCEEDING AND
> UNBOUNDED FOLLOWING", which is different from the default window frame
> setting of most DBMSs "RANGE BETWEEN UNBOUND PROCEEDING AND CURRENT ROW".
> Following should be done:
> 1) Applying correct default window frame for first_value, last_value,
> nth_value, and aggregation functions .
> 2) Supporting various window frame expressions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)