Stephan and I came up with the following document about how to handle failures 
of tasks and how to make sure we properly attribute the failure to the correct 
root cause and suppress follow-up failures. The document defines the behaviour 
that should be followed for different kinds of task failures.

https://cwiki.apache.org/confluence/display/FLINK/Task+Failures+and+Error+Handling

Feel free to comment.

I will open issues for the respective issues if there are no objections.

– Ufuk

Reply via email to