[
https://issues.apache.org/jira/browse/TAJO-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491529#comment-14491529
]
ASF GitHub Bot commented on TAJO-1415:
--------------------------------------
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/454#discussion_r28204392
--- Diff:
tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/WindowAggExec.java
---
@@ -301,30 +301,327 @@ private void evaluationWindowFrame() {
}
for (int idx = 0; idx < functions.length; idx++) {
+ comp = null;
if (orderedFuncFlags[idx]) {
- comp = new BaseTupleComparator(inSchema,
functions[idx].getSortSpecs());
- Collections.sort(accumulatedInTuples, comp);
- comp = new BaseTupleComparator(schemaForOrderBy,
functions[idx].getSortSpecs());
+ SortSpec[] sortSpecs = functions[idx].getSortSpecs();
+ comp = new BaseTupleComparator(schemaForOrderBy, sortSpecs);
Collections.sort(evaluatedTuples, comp);
+ // following comparator is used later when RANGE unit is handled
to check whether order by value is changed or not
+ comp = new BaseTupleComparator(inSchema, sortSpecs);
+ Collections.sort(accumulatedInTuples, comp);
}
- for (int i = 0; i < accumulatedInTuples.size(); i++) {
- Tuple inTuple = accumulatedInTuples.get(i);
- Tuple outTuple = evaluatedTuples.get(i);
+ LogicalWindowSpec.LogicalWindowFrame.WindowFrameType windowFrameType
= functions[idx].getLogicalWindowFrame().getFrameType();
+ WindowSpec.WindowFrameUnit windowFrameUnit =
functions[idx].getLogicalWindowFrame().getFrameUnit();
+ int frameStart = 0, frameEnd = accumulatedInTuples.size() - 1;
+ int startOffset =
functions[idx].getLogicalWindowFrame().getStartBound().getNumber();
+ int endOffset =
functions[idx].getLogicalWindowFrame().getEndBound().getNumber();
+ functions[idx].bind(inSchema);
+
+ /*
+ Following code handles evaluation of window functions with two
nested switch statements
+ Basically, ROWS handling has more cases then RANGE handling
+ First switch distinguishes among
+ 1) built-in window functions without window frame support
+ 2) buiit-in window functions with window frame support
+ 3) aggregation functions, where window frame is supported
+ In window frame support case, there exists four types of window
frame which is also handled by switch statement
+ a) Entire window partition
+ b) From the start of window partition to the moving end point
relative to current row position
+ c) From the moving start point relative to current row
position to the end of window partition
+ d) Both start point and end point of window frame are moving
relative to the current row position
+
+ In the case of RANGE, there can be three window frame type
+ i) From the start of window partition to the last row that has
the same order by key as the current row
+ ii) From the first row that has the same order by key as the
current row to the end of window partition
+ iii) For all rows that has the same order by key as the
current row
+ */
--- End diff --
Thanks for detailed and nice comments!
By the way, the below code block look quite complicated. Would you refactor
it into several well-defined functions?
> Window frame support
> --------------------
>
> Key: TAJO-1415
> URL: https://issues.apache.org/jira/browse/TAJO-1415
> Project: Tajo
> Issue Type: Sub-task
> Components: distributed query plan, parser, physical operator,
> planner/optimizer
> Reporter: Keuntae Park
> Assignee: Keuntae Park
> Fix For: window function
>
>
> We can define frame clause in window definition like
> {code}
> [ RANGE | ROWS ] frame_start
> [ RANGE | ROWS ] BETWEEN frame_start AND frame_end
> {code}
> , where frame_start and frame_end can be one of
> {code}
> UNBOUNDED PRECEDING
> value PRECEDING
> CURRENT ROW
> value FOLLOWING
> UNBOUNDED FOLLOWING
> {code}
> According to the window functions description of
> PostgreSQL(http://www.postgresql.org/docs/9.4/static/functions-window.html),
> there are two types of window functions based on window frame support.
> 1) row_number, rank, dense_rank, percent_rank, cume_dist, tile, lag and lead:
> these functions only work within window partition, which means window frame
> has no effect on these functions.
> 2) first_value, last_value, nth_value, and aggregation function as as window
> function: these functions should work with rows within window frame.
> Currently, Tajo parser recognize the window frame grammar but windowAggExec
> does not use that information.
> It works as if window frame is set as "RANGE BETWEEN UNBOUND PROCEEDING AND
> UNBOUNDED FOLLOWING", which is different from the default window frame
> setting of most DBMSs "RANGE BETWEEN UNBOUND PROCEEDING AND CURRENT ROW".
> Following should be done:
> 1) Applying correct default window frame for first_value, last_value,
> nth_value, and aggregation functions .
> 2) Supporting various window frame expressions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)