[
https://issues.apache.org/jira/browse/FLINK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063843#comment-15063843
]
ASF GitHub Bot commented on FLINK-3184:
---------------------------------------
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1468#issuecomment-165752725
With upcoming Mesos integration (and some YARN refactoring), we can
probably drop the heartbeats between master and worker as well from Akka.
> Decrease Akka timeouts on cluster side to make system more responsive
> ---------------------------------------------------------------------
>
> Key: FLINK-3184
> URL: https://issues.apache.org/jira/browse/FLINK-3184
> Project: Flink
> Issue Type: Improvement
> Affects Versions: 1.0.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Minor
>
> Currently, the default timeout for futures is set to 100 s. This also the
> time used to wait in between restart attempts if no other value has been
> explicitly specified. Especially in the streaming case, it is often necessary
> to detect failures and to react to failures in shorter period than 100 s.
> Therefore, I propose to decrease the default timeout to 10 s.
> Additionally, I propose to introduce a slightly higher timeout for the client
> side (e.g. 60 s). The reason is that in case of a {{JobManager}} the client
> has to wait until the cluster has recovered. Using ZooKeeper for that can
> entail a longer timeout than 10 s. In such a case a recovery could be falsely
> recognized as a lost connection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)