[
https://issues.apache.org/jira/browse/FLINK-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709509#comment-14709509
]
ASF GitHub Bot commented on FLINK-2472:
---------------------------------------
Github user sachingoel0101 commented on a diff in the pull request:
https://github.com/apache/flink/pull/979#discussion_r37767947
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/client/JobClientActor.java
---
@@ -144,11 +268,25 @@ else if (message instanceof Terminated) {
String msg = "Lost connection to JobManager " +
jobManager.path();
logger.info(msg);
submitter.tell(decorateMessage(new
Status.Failure(new Exception(msg))), getSelf());
+ resetContextAndActor();
} else {
logger.error("Received 'Terminated' for unknown
actor " + target);
}
}
+ // ============= No messgaes received in the job manager
timeout duration ========
+ else if (message instanceof ReceiveTimeout){
+ double tolerance = 0.1 *
JOB_CLIENT_JOB_MANAGER_TIMEOUT.toMillis();
--- End diff --
This was to deal with the possibility that akka might enqueue a timeout
message right after we get the desired message. So this checks that the timeout
message is unintentional, and if we did get a message from `JobManager` just
before this.
> Make the JobClientActor check periodically if the submitted Job is still
> running and if the JobManager is still alive
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-2472
> URL: https://issues.apache.org/jira/browse/FLINK-2472
> Project: Flink
> Issue Type: Improvement
> Reporter: Till Rohrmann
> Assignee: Sachin Goel
>
> In case that the {{JobManager}} dies without notifying possibly connected
> {{JobClientActors}} or if the job execution finishes without sending the
> {{SerializedJobExecutionResult}} back to the {{JobClientActor}}, it might
> happen that a {{JobClient.submitJobAndWait}} never returns.
> I propose to let the {{JobClientActor}} periodically check whether the
> {{JobManager}} is still alive and whether the submitted job is still running.
> If not, then the {{JobClientActor}} should return an exception to complete
> the waiting future.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)