[
https://issues.apache.org/jira/browse/FLINK-18112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuan Mei updated FLINK-18112:
-----------------------------
Summary: Approximate Task-Local Recovery (was: Single Task Approximate
Failure Recovery)
> Approximate Task-Local Recovery
> -------------------------------
>
> Key: FLINK-18112
> URL: https://issues.apache.org/jira/browse/FLINK-18112
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing, Runtime / Coordination, Runtime
> / Network
> Affects Versions: 1.12.0
> Reporter: Yuan Mei
> Assignee: Yuan Mei
> Priority: Major
> Fix For: 1.12.0
>
>
> Build a prototype of single task failure recovery to address and answer the
> following questions:
> *Step 1*: Scheduling part, restart a single node without restarting the
> upstream or downstream nodes.
> *Step 2*: Checkpointing part, as my understanding of how regional failover
> works, this part might not need modification.
> *Step 3*: Network part
> - how the recovered node able to link to the upstream ResultPartitions, and
> continue getting data
> - how the downstream node able to link to the recovered node, and continue
> getting node
> - how different netty transit mode affects the results
> - what if the failed node buffered data pool is full
> *Step 4*: Failover process verification
--
This message was sent by Atlassian Jira
(v8.3.4#803005)