[ 
https://issues.apache.org/jira/browse/FLINK-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-27802:
-------------------------------
    Description: 
When a job is submitted with an incorrect savepoint path the error is swallowed 
by Flink due to the result store:

2022-05-26 12:34:43,497 WARN 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Ignoring JobGraph 
submission 'State machine job' (00000000000000000000000000000000) because the 
job already reached a globally-terminal state (i.e. FAILED, CANCELED, FINISHED) 
in a previous execution.
2022-05-26 12:34:43,552 INFO 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap 
[] - Application completed SUCCESSFULLY

The easiest way to reproduce this is to create a new deployment and set 
initialSavepointPath to a random missing path.

  was:
We are currently setting both a result store and the 
"execution.submit-failed-job-on-application-error" config for HA jobs.

This leads to swallowed job submission errors that only show up in the result 
store, but the flink job is not actually displayed in the failed state:


2022-05-26 12:34:43,497 WARN 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Ignoring JobGraph 
submission 'State machine job' (00000000000000000000000000000000) because the 
job already reached a globally-terminal state (i.e. FAILED, CANCELED, FINISHED) 
in a previous execution.
2022-05-26 12:34:43,552 INFO 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap 
[] - Application completed SUCCESSFULLY


The easiest way to reproduce this is to create a new deployment and set 
initialSavepointPath to a random missing path.

I consider this a bug in Flink but we should simply disable the 
execution.submit-failed-job-on-application-error config.


> Savepoint restore errors are swallowed for Flink 1.15
> -----------------------------------------------------
>
>                 Key: FLINK-27802
>                 URL: https://issues.apache.org/jira/browse/FLINK-27802
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.0.0
>            Reporter: Gyula Fora
>            Priority: Critical
>
> When a job is submitted with an incorrect savepoint path the error is 
> swallowed by Flink due to the result store:
> 2022-05-26 12:34:43,497 WARN 
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Ignoring 
> JobGraph submission 'State machine job' (00000000000000000000000000000000) 
> because the job already reached a globally-terminal state (i.e. FAILED, 
> CANCELED, FINISHED) in a previous execution.
> 2022-05-26 12:34:43,552 INFO 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap 
> [] - Application completed SUCCESSFULLY
> The easiest way to reproduce this is to create a new deployment and set 
> initialSavepointPath to a random missing path.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to