[
https://issues.apache.org/jira/browse/FLINK-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466014#comment-16466014
]
Rong Rong commented on FLINK-7001:
----------------------------------
Thanks [~pgrulich], This is definitely a great solution when handling high
frequency, long length sliding windows.
I briefly went over the paper and got a few questions regarding the use cases
and compatibility.
* The non-overlapping slide separator + slide manager approach is very elegant
in order to save memory buffer usage and having a sole slice manager to handle
out-of-order messages by updating the slices in store is definitely great. My
concern is with [~StephanEwen] on this especially the backward & RocksDB
compatibility.
* Another point is the partial aggregates vs final aggregates complexity.
There's little discussed in the paper regarding the "Window manager" and seems
like the assumption is the final aggregate over the partial results will have
the same amount of time/space complexity comparing with the partial aggregates.
Most of the built-in aggregate functions we currently have in Flink are pretty
much satisfied with this assumption, however, there are some complex aggregate
functions of which the "merge" method might be much more complex than the
"accumulate" methods. Would we have to consider the trade off between these two
approaches?
* https://issues.apache.org/jira/browse/FLINK-5387 seems to suggest there are
trade-offs when using aligning window approaches. We can probably extend the
discussions here.
Thanks,
Rong
> Improve performance of Sliding Time Window with pane optimization
> -----------------------------------------------------------------
>
> Key: FLINK-7001
> URL: https://issues.apache.org/jira/browse/FLINK-7001
> Project: Flink
> Issue Type: Improvement
> Components: DataStream API
> Reporter: Jark Wu
> Assignee: Jark Wu
> Priority: Major
>
> Currently, the implementation of time-based sliding windows treats each
> window individually and replicates records to each window. For a window of 10
> minute size that slides by 1 second the data is replicated 600 fold (10
> minutes / 1 second). We can optimize sliding window by divide windows into
> panes (aligned with slide), so that we can avoid record duplication and
> leverage the checkpoint.
> I will attach a more detail design doc to the issue.
> The following issues are similar to this issue: FLINK-5387, FLINK-6990
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)