Yuan Mei created FLINK-18112: -------------------------------- Summary: Single Task Failure Recovery Prototype Key: FLINK-18112 URL: https://issues.apache.org/jira/browse/FLINK-18112 Project: Flink Issue Type: New Feature Components: Runtime / Checkpointing, Runtime / Coordination, Runtime / Network Affects Versions: 1.12.0 Environment: Build a prototype of single task failure recovery to address and answer the following questions:
Step 1: Scheduling part, restart a single node without restarting the upstream or downstream nodes. Step 2: Checkpointing part, as my understanding of how regional failover works, this part might not need modification. Step 3: Network part - how the recovered node able to link to the upstream ResultPartitions, and continue getting data - how the downstream node able to link to the recovered node, and continue getting node - how different netty transit mode affects the results - what if the failed node buffered data pool is full Step 4: Failover process verification Reporter: Yuan Mei Fix For: 1.12.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)