[
https://issues.apache.org/jira/browse/FLINK-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909001#comment-16909001
]
Zhu Zhu commented on FLINK-13056:
---------------------------------
[~till.rohrmann] we tried to optimize it, at the cost of more data cached and
slowing down the region building time.
Taking the sample case with 8000 vertices and 16,000,000 edges as an example.
The failover time reduced from 1961ms to 110ms.
The region building time increases from 523ms to 5681ms as a side effect.
[https://docs.google.com/document/d/1-QLxe4FXqXBuxlYsNmNU-R21euoTkzk1JAS6Lvrd-F4/edit]
> Optimize region failover performance on calculating vertices to restart
> -----------------------------------------------------------------------
>
> Key: FLINK-13056
> URL: https://issues.apache.org/jira/browse/FLINK-13056
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.9.0
> Reporter: Zhu Zhu
> Assignee: Zhu Zhu
> Priority: Major
>
> Currently some region boundary structures are calculated each time of a
> region failover. This calculation can be heavy as its complexity goes up with
> execution edge count.
> We tested it in a sample case with 8000 vertices and 16,000,000 edges. It
> takes ~2.0s to calculate vertices to restart.
> (more details in
> [https://docs.google.com/document/d/197Ou-01h2obvxq8viKqg4FnOnsykOEKxk3r5WrVBPuA/edit?usp=sharing)]
> That's why we'd propose to cache the region boundary structures to improve
> the region failover performance.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)