[
https://issues.apache.org/jira/browse/FLINK-19693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225467#comment-17225467
]
Jin Xing commented on FLINK-19693:
----------------------------------
Hi, Yuan ~
I posted above comment after learning this series of JIRAs. The comment is not
strongly about the details of this JIRA. But I don't mean to distract the
subject. I will open another thread if you and the community think my concerns
are valid. Really appreciate your kindness.
> Scheduler Change for Approximate Local Recovery to Restart Downstream of a
> Failed Task
> --------------------------------------------------------------------------------------
>
> Key: FLINK-19693
> URL: https://issues.apache.org/jira/browse/FLINK-19693
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: Yuan Mei
> Assignee: Yuan Mei
> Priority: Major
> Labels: pull-request-available
>
> Enables downstream failover for approximate local recovery.
> That says if a task fails, all its downstream tasks restart, including
> itself. This is achieved by reusing the existing
> {{RestartPipelinedRegionFailoverStrategy}} --- treat each individual task
> connected by ResultPartition.Pipelined_Approximate as a separate region.
>
> It introduces an attribute "reconnectable" in ResultPartitionType to indicate
> whether the partition is reconnectable. Notice that this is only a temporary
> solution for now. It will be removed after:
> # Approximate local recovery has its won failover strategy to restart the
> failed set of tasks instead of restarting downstream of failed tasks
> depending on {[@link|https://github.com/code]
> RestartPipelinedRegionFailoverStrategy}
> # FLINK-19895: Unify the life cycle of ResultPartitionType Pipelined Family.
> There is also a good discussion on this in FLINK-19632.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)