[
https://issues.apache.org/jira/browse/FLINK-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464708#comment-16464708
]
Philipp Grulich commented on FLINK-7001:
----------------------------------------
Hi [~suez1224] ,
here some more details to our approach:
Theory:
Our main focus were multi-query windows, which means that one window operator
could process windows of different sizes or even types, which is currently not
possible in Flink. However, even it would be possible to define such window
operators in the current Flink implementation it would suffer from a low
throughput because all elements of the stream would get replicated to all
window aggregates.
The same problem is also observable in the current Flink implementation with
Sliding Windows when we have a long size and a small slide, as discussed by
[~jark] .
Our approach uses slicing to mitigate this problem. Each element of the stream
gets only inserted into one slice and windows share slices if they overlap.
However, when a window gets emitted we have to combine the individual slices,
which leads to a higher latency (~ 1ms). But our throughput is ~4 times higher
than the current Flink implementation and completely tolerant against the size
of windows.
Current Implementation:
We prototyped this on Flink 1.3 and basically replaced the whole window
operator. We also implemented the multi-query capabilities which are not
necessary for a productive Flink implementation.
The main benefit of our approach is that it enables a general framework for
different kinds of window types. In contrast to this, it has a higher
complexity which could reduce when the multi-query functionality is not
required. But in general, it would basically be a complete rewrite of the
current flink window operator.
In contrast to this, we could also introduce the slicing concept only to
sliding windows. Such that always when a sliding window is used we call a
specialized window operator. This would be easy to implement.
Best,
Philipp
> Improve performance of Sliding Time Window with pane optimization
> -----------------------------------------------------------------
>
> Key: FLINK-7001
> URL: https://issues.apache.org/jira/browse/FLINK-7001
> Project: Flink
> Issue Type: Improvement
> Components: DataStream API
> Reporter: Jark Wu
> Assignee: Jark Wu
> Priority: Major
>
> Currently, the implementation of time-based sliding windows treats each
> window individually and replicates records to each window. For a window of 10
> minute size that slides by 1 second the data is replicated 600 fold (10
> minutes / 1 second). We can optimize sliding window by divide windows into
> panes (aligned with slide), so that we can avoid record duplication and
> leverage the checkpoint.
> I will attach a more detail design doc to the issue.
> The following issues are similar to this issue: FLINK-5387, FLINK-6990
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)