Hi,
1. Currently, much work in FLINK-4256 is about failover improvements in the
bouded dataset scenario.
2. For the streaming scenario,  a new shuffle plugin + proper failover
strategy could avoid the "stop-the-word" recovery.
3. We have already done many works about the new shuffle in the old Flink
shuffle architectures because many of our customers have the concern. We
have a plan to move the work to the new Flink pluggable shuffle
architecture.

Best,
Guowei


Thomas Weise <t...@apache.org> 于2019年7月26日周五 上午8:54写道:

> Hi,
>
> We are using Flink for streaming and find the "stop-the-world" recovery
> behavior of Flink prohibitive for use cases that prioritize availability.
> Partial recovery as outlined in FLIP-1 would probably alleviate these
> concerns.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
>
> Looking at the subtasks in
> https://issues.apache.org/jira/browse/FLINK-4256 it
> appears that much of the work was already done but not much recent
> progress? What is missing (for streaming)? How close is version 2 (recovery
> from limited intermediate results)?
>
> Thanks!
> Thomas
>

Reply via email to