[ https://issues.apache.org/jira/browse/FLINK-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931427#comment-16931427 ]
Till Rohrmann commented on FLINK-13056: --------------------------------------- This sounds very promising [~zhuzh]. Let's try to get it in for Flink 1.10. The great thing is that it does not affect any other existing Flink components and is self-contained. This should make it easier to merge. > Optimize region failover performance on calculating vertices to restart > ----------------------------------------------------------------------- > > Key: FLINK-13056 > URL: https://issues.apache.org/jira/browse/FLINK-13056 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.9.0 > Reporter: Zhu Zhu > Assignee: Zhu Zhu > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently some region boundary structures are calculated each time of a > region failover. This calculation can be heavy as its complexity goes up with > execution edge count. > We tested it in a sample case with 8000 vertices and 16,000,000 edges. It > takes ~2.0s to calculate vertices to restart. > (more details in > [https://docs.google.com/document/d/197Ou-01h2obvxq8viKqg4FnOnsykOEKxk3r5WrVBPuA/edit?usp=sharing)] > That's why we'd propose to cache the region boundary structures to improve > the region failover performance. -- This message was sent by Atlassian Jira (v8.3.2#803003)