[
https://issues.apache.org/jira/browse/FLINK-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969976#comment-16969976
]
Victor Wong commented on FLINK-14653:
-------------------------------------
_"I'm ok with tolerating checkpointing failures, but not ok with sacrificing
the correctness of my Flink job."_
Sorry I don't get that, AFAIK in case of checkpointing failures, the whole job
will restart and restore from the latest checkpointed state, so what do you
mean by "sacrificing the correctness", data loss or data duplicate?
> Job-related errors in snapshotState do not result in job failure
> ----------------------------------------------------------------
>
> Key: FLINK-14653
> URL: https://issues.apache.org/jira/browse/FLINK-14653
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Reporter: Maximilian Michels
> Priority: Minor
>
> When users override {{snapshoteState}}, they might include logic there which
> is crucial for the correctness of their application, e.g. finalizing a
> transaction and buffering the results of that transaction, or flushing events
> to an external store. Exceptions occurring should lead to failing the job.
> Currently, users must make sure to throw a {{Throwable}} because any
> {{Exception}} will be caught by the task and reported as checkpointing error,
> when it could be an application error.
> It would be helpful to update the documentation and introduce a special
> exception that can be thrown for job-related failures, e.g.
> {{ApplicationError}} or similar.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)