[jira] [Commented] (FLINK-7001) Improve performance of Sliding Time Window with pane optimization

Philipp Grulich (JIRA) Sat, 05 May 2018 01:46:56 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464708#comment-16464708
 ]


Philipp Grulich commented on FLINK-7001:
----------------------------------------

Hi [~suez1224] ,



here some more details to our approach:

Theory:
Our main focus were multi-query windows, which means that one window operator 
could process windows of different sizes or even types, which is currently not 
possible in Flink. However, even it would be possible to define such window 
operators in the current Flink implementation it would suffer from a low 
throughput because all elements of the stream would get replicated to all 
window aggregates.
The same problem is also observable in the current Flink implementation with 
Sliding Windows when we have a long size and a small slide, as discussed by 
[~jark] .

Our approach uses slicing to mitigate this problem. Each element of the stream 
gets only inserted into one slice and windows share slices if they overlap.
However, when a window gets emitted we have to combine the individual slices, 
which leads to a higher latency (~ 1ms). But our throughput is ~4 times higher 
than the current Flink implementation and completely tolerant against the size 
of windows.

Current Implementation:
We prototyped this on Flink 1.3 and basically replaced the whole window 
operator. We also implemented the multi-query capabilities which are not 
necessary for a productive Flink implementation.

The main benefit of our approach is that it enables a general framework for 
different kinds of window types. In contrast to this, it has a higher 
complexity which could reduce when the multi-query functionality is not 
required. But in general, it would basically be a complete rewrite of the 
current flink window operator. 

In contrast to this, we could also introduce the slicing concept only to 
sliding windows. Such that always when a sliding window is used we call a 
specialized window operator. This would be easy to implement.

Best,
Philipp

> Improve performance of Sliding Time Window with pane optimization
> -----------------------------------------------------------------
>
>                 Key: FLINK-7001
>                 URL: https://issues.apache.org/jira/browse/FLINK-7001
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Jark Wu
>            Assignee: Jark Wu
>            Priority: Major
>
> Currently, the implementation of time-based sliding windows treats each 
> window individually and replicates records to each window. For a window of 10 
> minute size that slides by 1 second the data is replicated 600 fold (10 
> minutes / 1 second). We can optimize sliding window by divide windows into 
> panes (aligned with slide), so that we can avoid record duplication and 
> leverage the checkpoint.
> I will attach a more detail design doc to the issue.
> The following issues are similar to this issue: FLINK-5387, FLINK-6990



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-7001) Improve performance of Sliding Time Window with pane optimization

Reply via email to