[
https://issues.apache.org/jira/browse/FLINK-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyula Fora updated FLINK-27802:
-------------------------------
Description:
When a job is submitted with an incorrect savepoint path the error is swallowed
by Flink due to the result store:
2022-05-26 12:34:43,497 WARN
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Ignoring JobGraph
submission 'State machine job' (00000000000000000000000000000000) because the
job already reached a globally-terminal state (i.e. FAILED, CANCELED, FINISHED)
in a previous execution.
2022-05-26 12:34:43,552 INFO
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap
[] - Application completed SUCCESSFULLY
The easiest way to reproduce this is to create a new deployment and set
initialSavepointPath to a random missing path.
was:
We are currently setting both a result store and the
"execution.submit-failed-job-on-application-error" config for HA jobs.
This leads to swallowed job submission errors that only show up in the result
store, but the flink job is not actually displayed in the failed state:
2022-05-26 12:34:43,497 WARN
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Ignoring JobGraph
submission 'State machine job' (00000000000000000000000000000000) because the
job already reached a globally-terminal state (i.e. FAILED, CANCELED, FINISHED)
in a previous execution.
2022-05-26 12:34:43,552 INFO
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap
[] - Application completed SUCCESSFULLY
The easiest way to reproduce this is to create a new deployment and set
initialSavepointPath to a random missing path.
I consider this a bug in Flink but we should simply disable the
execution.submit-failed-job-on-application-error config.
> Savepoint restore errors are swallowed for Flink 1.15
> -----------------------------------------------------
>
> Key: FLINK-27802
> URL: https://issues.apache.org/jira/browse/FLINK-27802
> Project: Flink
> Issue Type: Improvement
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.0.0
> Reporter: Gyula Fora
> Priority: Critical
>
> When a job is submitted with an incorrect savepoint path the error is
> swallowed by Flink due to the result store:
> 2022-05-26 12:34:43,497 WARN
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Ignoring
> JobGraph submission 'State machine job' (00000000000000000000000000000000)
> because the job already reached a globally-terminal state (i.e. FAILED,
> CANCELED, FINISHED) in a previous execution.
> 2022-05-26 12:34:43,552 INFO
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap
> [] - Application completed SUCCESSFULLY
> The easiest way to reproduce this is to create a new deployment and set
> initialSavepointPath to a random missing path.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)