Hi,

We are using Flink for streaming and find the "stop-the-world" recovery
behavior of Flink prohibitive for use cases that prioritize availability.
Partial recovery as outlined in FLIP-1 would probably alleviate these
concerns.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

Looking at the subtasks in https://issues.apache.org/jira/browse/FLINK-4256 it
appears that much of the work was already done but not much recent
progress? What is missing (for streaming)? How close is version 2 (recovery
from limited intermediate results)?

Thanks!
Thomas

Reply via email to