[jira] [Commented] (FLINK-7001) Improve performance of Sliding Time Window with pane optimization

Rong Rong (JIRA) Mon, 07 May 2018 08:17:40 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466014#comment-16466014
 ]


Rong Rong commented on FLINK-7001:
----------------------------------

Thanks [~pgrulich], This is definitely a great solution when handling high 
frequency, long length sliding windows.

I briefly went over the paper and got a few questions regarding the use cases 
and compatibility. 

* The non-overlapping slide separator + slide manager approach is very elegant 
in order to save memory buffer usage and having a sole slice manager to handle 
out-of-order messages by updating the slices in store is definitely great. My 
concern is with [~StephanEwen] on this especially the backward & RocksDB 
compatibility. 

* Another point is the partial aggregates vs final aggregates complexity. 
There's little discussed in the paper regarding the "Window manager" and seems 
like the assumption is the final aggregate over the partial results will have 
the same amount of time/space complexity comparing with the partial aggregates. 
Most of the built-in aggregate functions we currently have in Flink are pretty 
much satisfied with this assumption, however, there are some complex aggregate 
functions of which the "merge" method might be much more complex than the 
"accumulate" methods. Would we have to consider the trade off between these two 
approaches? 

* https://issues.apache.org/jira/browse/FLINK-5387 seems to suggest there are 
trade-offs when using aligning window approaches. We can probably extend the 
discussions here. 

Thanks,
Rong

> Improve performance of Sliding Time Window with pane optimization
> -----------------------------------------------------------------
>
>                 Key: FLINK-7001
>                 URL: https://issues.apache.org/jira/browse/FLINK-7001
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Jark Wu
>            Assignee: Jark Wu
>            Priority: Major
>
> Currently, the implementation of time-based sliding windows treats each 
> window individually and replicates records to each window. For a window of 10 
> minute size that slides by 1 second the data is replicated 600 fold (10 
> minutes / 1 second). We can optimize sliding window by divide windows into 
> panes (aligned with slide), so that we can avoid record duplication and 
> leverage the checkpoint.
> I will attach a more detail design doc to the issue.
> The following issues are similar to this issue: FLINK-5387, FLINK-6990



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-7001) Improve performance of Sliding Time Window with pane optimization

Reply via email to