[ 
https://issues.apache.org/jira/browse/FLINK-23403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381829#comment-17381829
 ] 

Yang Wang commented on FLINK-23403:
-----------------------------------

I am afraid the decrease of heartbeat timeout will take some major impacts on 
the production Flink workloads.

For example
 * The fullGC takes a longer time than 10s.
 * Even though our internal network bandwidth is 10gb in Alibaba, we still 
found some heartbeat timeout issues when the network pressure or tcp 
retransmission is high. AFAIK, the network environment of the self-built IDCs 
is not better than this.

> Decrease default values for heartbeat timeout and interval
> ----------------------------------------------------------
>
>                 Key: FLINK-23403
>                 URL: https://issues.apache.org/jira/browse/FLINK-23403
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Configuration, Runtime / Coordination
>    Affects Versions: 1.14.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> In order to speed up failure detection I suggest to decrease the default 
> values for the heartbeat timeout and interval from 50s/10s to 15s/3s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to