[
https://issues.apache.org/jira/browse/FLINK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062548#comment-15062548
]
ASF GitHub Bot commented on FLINK-3184:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/1468
[FLINK-3184] [timeouts] Decrease timeouts
This PR introduces a client side timeout of 60 s and a cluster side timeout
of 10 s. Both timeouts can be configured via `akka.client.timeout` and
`akka.ask.timeout` in the configuration.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink decreaseAkkaTimeout
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1468.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1468
----
commit 754c0c408d92e931218a137f388fb77f51df964a
Author: Till Rohrmann <[email protected]>
Date: 2015-12-15T14:15:12Z
Harmonize config key for number of retries and retry delay
commit dd81da02ca6eaf8e0e38cf4511e26cb553c71f72
Author: Till Rohrmann <[email protected]>
Date: 2015-12-15T16:34:17Z
Add missing param descriptions to FlinkYarnCluster, remove implicit timeout
from ApplicationClient
commit 5e967bf8a9ba066be73905338acfd5deb4894602
Author: Till Rohrmann <[email protected]>
Date: 2015-12-15T16:37:20Z
[FLINK-3184] [timeouts] Set default cluster side timeout to 10 s and the
client side timeout to 60 s.
Adapt Akka failure detector timings to respect new 10 s Akka ask timeout.
Add logging statements to JobClientActor
Introduce separation between client and cluster timeout
Sets the cluster timeout to 10 s and the client timeout to 60 s.
----
> Decrease Akka timeouts on cluster side to make system more responsive
> ---------------------------------------------------------------------
>
> Key: FLINK-3184
> URL: https://issues.apache.org/jira/browse/FLINK-3184
> Project: Flink
> Issue Type: Improvement
> Affects Versions: 1.0.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Minor
>
> Currently, the default timeout for futures is set to 100 s. This also the
> time used to wait in between restart attempts if no other value has been
> explicitly specified. Especially in the streaming case, it is often necessary
> to detect failures and to react to failures in shorter period than 100 s.
> Therefore, I propose to decrease the default timeout to 10 s.
> Additionally, I propose to introduce a slightly higher timeout for the client
> side (e.g. 60 s). The reason is that in case of a {{JobManager}} the client
> has to wait until the cluster has recovered. Using ZooKeeper for that can
> entail a longer timeout than 10 s. In such a case a recovery could be falsely
> recognized as a lost connection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)