[
https://issues.apache.org/jira/browse/FLINK-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909096#comment-16909096
]
Zhu Zhu edited comment on FLINK-13056 at 8/16/19 2:45 PM:
----------------------------------------------------------
The diff can be found at
[https://github.com/zhuzhurk/flink/commit/4f7da57b218e9ccd86f468f9ece62ee1e378ceda].
Need to mention that this diff is based on the initial version of
flip1.RestartPipelinedRegionStrategy. So it cannot be applied to latest
flip1.RestartPipelinedRegionStrategy directly, as the region building was
refactored out from it later(for partition releasing).
The perf test case(RegionFailoverPerfTest#complexPerfTest) used can be found in
the same branch.
Agree that it's better to make this optimization configurable, as it has side
effects.
was (Author: zhuzh):
The diff can be found at
[https://github.com/zhuzhurk/flink/commit/4f7da57b218e9ccd86f468f9ece62ee1e378ceda].
Need to mention that this diff is based on the initial version of
flip1.RestartPipelinedRegionStrategy. So it cannot be applied to latest
flip1.RestartPipelinedRegionStrategy directly, as the region building was
refactored out from it later(for partition releasing).
The perf test case(RegionFailoverPerfTest#complexPerfTest) used can be found in
the same branch.
> Optimize region failover performance on calculating vertices to restart
> -----------------------------------------------------------------------
>
> Key: FLINK-13056
> URL: https://issues.apache.org/jira/browse/FLINK-13056
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.9.0
> Reporter: Zhu Zhu
> Assignee: Zhu Zhu
> Priority: Major
>
> Currently some region boundary structures are calculated each time of a
> region failover. This calculation can be heavy as its complexity goes up with
> execution edge count.
> We tested it in a sample case with 8000 vertices and 16,000,000 edges. It
> takes ~2.0s to calculate vertices to restart.
> (more details in
> [https://docs.google.com/document/d/197Ou-01h2obvxq8viKqg4FnOnsykOEKxk3r5WrVBPuA/edit?usp=sharing)]
> That's why we'd propose to cache the region boundary structures to improve
> the region failover performance.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)