[ https://issues.apache.org/jira/browse/FLINK-18112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuan Mei updated FLINK-18112: ----------------------------- Environment: (was: Build a prototype of single task failure recovery to address and answer the following questions: Step 1: Scheduling part, restart a single node without restarting the upstream or downstream nodes. Step 2: Checkpointing part, as my understanding of how regional failover works, this part might not need modification. Step 3: Network part - how the recovered node able to link to the upstream ResultPartitions, and continue getting data - how the downstream node able to link to the recovered node, and continue getting node - how different netty transit mode affects the results - what if the failed node buffered data pool is full Step 4: Failover process verification) > Single Task Failure Recovery Prototype > -------------------------------------- > > Key: FLINK-18112 > URL: https://issues.apache.org/jira/browse/FLINK-18112 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing, Runtime / Coordination, Runtime > / Network > Affects Versions: 1.12.0 > Reporter: Yuan Mei > Assignee: Yuan Mei > Priority: Major > Fix For: 1.12.0 > > > Build a prototype of single task failure recovery to address and answer the > following questions: > Step 1: Scheduling part, restart a single node without restarting the > upstream or downstream nodes. > Step 2: Checkpointing part, as my understanding of how regional failover > works, this part might not need modification. > Step 3: Network part > - how the recovered node able to link to the upstream ResultPartitions, and > continue getting data > - how the downstream node able to link to the recovered node, and continue > getting node > - how different netty transit mode affects the results > - what if the failed node buffered data pool is full > Step 4: Failover process verification -- This message was sent by Atlassian Jira (v8.3.4#803005)