JIN SUN created FLINK-10289:
-------------------------------
Summary: Classify Exceptions to different category for apply
different failover strategy
Key: FLINK-10289
URL: https://issues.apache.org/jira/browse/FLINK-10289
Project: Flink
Issue Type: Sub-task
Components: JobManager
Reporter: JIN SUN
Assignee: JIN SUN
We need to classify exceptions and treat them with different strategies. To do
this, we propose to introduce the following Throwable Types, and the
corresponding exceptions:
* NonRecoverable
* We shouldn’t retry if an exception was classified as NonRecoverable
* For example, NoResouceAvailiableException is a NonRecoverable Exception
* Introduce a new Exception UserCodeException to wrap all exceptions that
throw from user code
* PartitionDataMissingError
* In certain scenarios producer data was transferred in blocking mode or data
was saved in persistent store. If the partition was missing, we need to
revoke/rerun the produce task to regenerate the data.
* Introduce a new exception PartitionDataMissingException to wrap all those
kinds of issues.
* EnvironmentError
* It happened due to hardware, or software issues that were related to
specific environments. The assumption is that a task will succeed if we run it
in a different environment, and other task run in this bad environment will
very likely fail. If multiple task failures in the same machine due to
EnvironmentError, we need to consider adding the bad machine to blacklist, and
avoiding schedule task on it.
* Introduce a new exception EnvironmentException to wrap all those kind of
issues.
* Recoverable
* We assume other issues are recoverable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)