[ 
https://issues.apache.org/jira/browse/FLINK-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994745#comment-16994745
 ] 

Stephan Ewen commented on FLINK-14653:
--------------------------------------

I think it is valid to distinguish between "checkpoint materialization failed" 
and "program failed in a checkpoint method".

[~mxm] Could this simple solution work: Throwing an error in "shapshotState()" 
(the synchronous part) causes the job to fail, which errors in the asynchronous 
parts cause only the checkpoint to fail. The underlying assumption would be 
that everything in the synchronous part is might affect the correctness of the 
job, while everything in the asynchronous part would only affect the checkpoint.

> Job-related errors in snapshotState do not result in job failure
> ----------------------------------------------------------------
>
>                 Key: FLINK-14653
>                 URL: https://issues.apache.org/jira/browse/FLINK-14653
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>            Reporter: Maximilian Michels
>            Priority: Minor
>
> When users override {{snapshoteState}}, they might include logic there which 
> is crucial for the correctness of their application, e.g. finalizing a 
> transaction and buffering the results of that transaction, or flushing events 
> to an external store. Exceptions occurring should lead to failing the job.
> Currently, users must make sure to throw a {{Throwable}} because any 
> {{Exception}} will be caught by the task and reported as checkpointing error, 
> when it could be an application error.
> It would be helpful to update the documentation and introduce a special 
> exception that can be thrown for job-related failures, e.g. 
> {{ApplicationError}} or similar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to