[
https://issues.apache.org/jira/browse/FLINK-18112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuan Mei updated FLINK-18112:
-----------------------------
Environment: (was: Build a prototype of single task failure recovery to
address and answer the following questions:
Step 1: Scheduling part, restart a single node without restarting the upstream
or downstream nodes.
Step 2: Checkpointing part, as my understanding of how regional failover works,
this part might not need modification.
Step 3: Network part
- how the recovered node able to link to the upstream ResultPartitions, and
continue getting data
- how the downstream node able to link to the recovered node, and continue
getting node
- how different netty transit mode affects the results
- what if the failed node buffered data pool is full
Step 4: Failover process verification)
> Single Task Failure Recovery Prototype
> --------------------------------------
>
> Key: FLINK-18112
> URL: https://issues.apache.org/jira/browse/FLINK-18112
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing, Runtime / Coordination, Runtime
> / Network
> Affects Versions: 1.12.0
> Reporter: Yuan Mei
> Assignee: Yuan Mei
> Priority: Major
> Fix For: 1.12.0
>
>
> Build a prototype of single task failure recovery to address and answer the
> following questions:
> Step 1: Scheduling part, restart a single node without restarting the
> upstream or downstream nodes.
> Step 2: Checkpointing part, as my understanding of how regional failover
> works, this part might not need modification.
> Step 3: Network part
> - how the recovered node able to link to the upstream ResultPartitions, and
> continue getting data
> - how the downstream node able to link to the recovered node, and continue
> getting node
> - how different netty transit mode affects the results
> - what if the failed node buffered data pool is full
> Step 4: Failover process verification
--
This message was sent by Atlassian Jira
(v8.3.4#803005)