[ 
https://issues.apache.org/jira/browse/FLINK-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492070#comment-16492070
 ] 

Rong Rong commented on FLINK-7001:
----------------------------------

Hi [~jark], [~StephanEwen],

I was wondering if there's any further development since the last discussion? 

We are investigating multiple extreme use cases (for example: customized 
aggregation on a 15-seconds sliding window over 7 days of data), on our side 
regarding the sliding window performance improvement and we would love to 
contribute or take the lead in this effort.

To summarized some of the discussions from [~StephanEwen], [~pgrulich], 
[~walterddr] had:

- Splitting currently generic window operator into *aligned* windows (e.g. 
sliding/tumble window) and *unaligned* window (e.g. session window) catagories 
  -- Further improve performance in each catagory individually.
  -- Having one timer per window instead of per window/key combination if 
possible.

- Deduplication optimization through pane split and pane merging on *aligned 
window* operators:
  -- Algorithm that handles pane optimization efficiently, and early/late 
firing compatibilities.
  -- Need to be compatible and work for RockSDB state backend.
  -- Backward compatibility with savepoints.

- Efficiency trade-off mechanism that selects optimization methods (pane split, 
traditional, etc) depending on 
  -- Split accumulat operation vs. merge operation complexity.
  -- Latency vs. complexity vs. Memory footprint

Do you guys think this could be a good starting point for some concrete 
solution?
I tried to summarized and collect as much information, any additional comments 
and suggestions are highly appreciated. 

Thanks,
Rong 

> Improve performance of Sliding Time Window with pane optimization
> -----------------------------------------------------------------
>
>                 Key: FLINK-7001
>                 URL: https://issues.apache.org/jira/browse/FLINK-7001
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Jark Wu
>            Assignee: Jark Wu
>            Priority: Major
>
> Currently, the implementation of time-based sliding windows treats each 
> window individually and replicates records to each window. For a window of 10 
> minute size that slides by 1 second the data is replicated 600 fold (10 
> minutes / 1 second). We can optimize sliding window by divide windows into 
> panes (aligned with slide), so that we can avoid record duplication and 
> leverage the checkpoint.
> I will attach a more detail design doc to the issue.
> The following issues are similar to this issue: FLINK-5387, FLINK-6990



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to