Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1633#discussion_r52998567
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
    @@ -1073,57 +1073,73 @@ class JobManager(
           // execute the recovery/writing the jobGraph into the 
SubmittedJobGraphStore asynchronously
           // because it is a blocking operation
           future {
    -        try {
    -          if (isRecovery) {
    -            executionGraph.restoreLatestCheckpointedState()
    -          }
    -          else {
    -            val snapshotSettings = jobGraph.getSnapshotSettings
    -            if (snapshotSettings != null) {
    -              val savepointPath = snapshotSettings.getSavepointPath()
    +        val restoreStateSuccess =
    +          try {
    +            if (isRecovery) {
    +              executionGraph.restoreLatestCheckpointedState()
    --- End diff --
    
    The behaviour right now for a failure while doing a job recovery would 
simply fail the `ExecutionGraph` triggering a restart. A successful job 
recovery would send a `JobSubmitSuccess` to the client. I'm not sure whether 
this is actually correct, since the client already received a 
`JobSubmitMessage` from the `JobManager` while initially submitting the job. 
But I think this will simply be ignored.
    
    Thus, suppressing the restart behaviour in case of a job recovery would 
actually change the behaviour.
    
    If it makes sense and if it is possible to recover from failures while 
recovering a job or restoring a savepoint, it would make sense to not directly 
fail the job without restarting. Maybe one should distinguish that based on the 
actually occurring exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to