[ 
https://issues.apache.org/jira/browse/FLINK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062548#comment-15062548
 ] 

ASF GitHub Bot commented on FLINK-3184:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/1468

    [FLINK-3184] [timeouts] Decrease timeouts

    This PR introduces a client side timeout of 60 s and a cluster side timeout 
of 10 s. Both timeouts can be configured via `akka.client.timeout` and 
`akka.ask.timeout` in the configuration.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink decreaseAkkaTimeout

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1468
    
----
commit 754c0c408d92e931218a137f388fb77f51df964a
Author: Till Rohrmann <[email protected]>
Date:   2015-12-15T14:15:12Z

    Harmonize config key for number of retries and retry delay

commit dd81da02ca6eaf8e0e38cf4511e26cb553c71f72
Author: Till Rohrmann <[email protected]>
Date:   2015-12-15T16:34:17Z

    Add missing param descriptions to FlinkYarnCluster, remove implicit timeout 
from ApplicationClient

commit 5e967bf8a9ba066be73905338acfd5deb4894602
Author: Till Rohrmann <[email protected]>
Date:   2015-12-15T16:37:20Z

    [FLINK-3184] [timeouts] Set default cluster side timeout to 10 s and the 
client side timeout to 60 s.
    
    Adapt Akka failure detector timings to respect new 10 s Akka ask timeout. 
Add logging statements to JobClientActor
    
    Introduce separation between client and cluster timeout
    
    Sets the cluster timeout to 10 s and the client timeout to 60 s.

----


> Decrease Akka timeouts on cluster side to make system more responsive
> ---------------------------------------------------------------------
>
>                 Key: FLINK-3184
>                 URL: https://issues.apache.org/jira/browse/FLINK-3184
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> Currently, the default timeout for futures is set to 100 s. This also the 
> time used to wait in between restart attempts if no other value has been 
> explicitly specified. Especially in the streaming case, it is often necessary 
> to detect failures and to react to failures in shorter period than 100 s. 
> Therefore, I propose to decrease the default timeout to 10 s.
> Additionally, I propose to introduce a slightly higher timeout for the client 
> side (e.g. 60 s). The reason is that in case of a {{JobManager}} the client 
> has to wait until the cluster has recovered. Using ZooKeeper for that can 
> entail a longer timeout than 10 s. In such a case a recovery could be falsely 
> recognized as a lost connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to