[jira] [Comment Edited] (FLINK-13056) Optimize region failover performance on calculating vertices to restart

Zhu Zhu (JIRA) Fri, 16 Aug 2019 07:46:09 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909096#comment-16909096
 ]


Zhu Zhu edited comment on FLINK-13056 at 8/16/19 2:45 PM:
----------------------------------------------------------

The diff can be found at 
[https://github.com/zhuzhurk/flink/commit/4f7da57b218e9ccd86f468f9ece62ee1e378ceda].

Need to mention that this diff is based on the initial version of 
flip1.RestartPipelinedRegionStrategy. So it cannot be applied to latest 
flip1.RestartPipelinedRegionStrategy directly, as the region building was 
refactored out from it later(for partition releasing).

The perf test case(RegionFailoverPerfTest#complexPerfTest) used can be found in 
the same branch.

Agree that it's better to make this optimization configurable, as it has side 
effects.


was (Author: zhuzh):
The diff can be found at 
[https://github.com/zhuzhurk/flink/commit/4f7da57b218e9ccd86f468f9ece62ee1e378ceda].

Need to mention that this diff is based on the initial version of 
flip1.RestartPipelinedRegionStrategy. So it cannot be applied to latest 
flip1.RestartPipelinedRegionStrategy directly, as the region building was 
refactored out from it later(for partition releasing).

The perf test case(RegionFailoverPerfTest#complexPerfTest) used can be found in 
the same branch.

> Optimize region failover performance on calculating vertices to restart
> -----------------------------------------------------------------------
>
>                 Key: FLINK-13056
>                 URL: https://issues.apache.org/jira/browse/FLINK-13056
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.0
>            Reporter: Zhu Zhu
>            Assignee: Zhu Zhu
>            Priority: Major
>
> Currently some region boundary structures are calculated each time of a 
> region failover. This calculation can be heavy as its complexity goes up with 
> execution edge count.
> We tested it in a sample case with 8000 vertices and 16,000,000 edges. It 
> takes ~2.0s to calculate vertices to restart.
> (more details in 
> [https://docs.google.com/document/d/197Ou-01h2obvxq8viKqg4FnOnsykOEKxk3r5WrVBPuA/edit?usp=sharing)]
> That's why we'd propose to cache the region boundary structures to improve 
> the region failover performance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Comment Edited] (FLINK-13056) Optimize region failover performance on calculating vertices to restart

Reply via email to