[jira] [Comment Edited] (FLINK-17871) Make the default value of attemptFailuresValidityInterval more reasonable

fanxin (Jira) Tue, 24 Nov 2020 02:45:07 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-17871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238030#comment-17238030
 ]


fanxin edited comment on FLINK-17871 at 11/24/20, 10:44 AM:
------------------------------------------------------------

hi, [~azagrebin] [~chesnay] 
 Sorry for the late reply. We have a yarn cluster with about 200 nodes(128G, 32 
cores). Based on our experience, it should greater than 30s. A single Flink on 
yarn distribute it’s jars  and launch the AppMaster will cost at least 10s, if 
we don't put these jars in HDFS in advance.


was (Author: fanxiin):
hi, [~azagrebin] [~chesnay] 
 Sorry for the late reply. We have a yarn cluster with about 200 nodes(128G, 32 
cores and hhd). Based on our experience, it should greater than 30s. A single 
Flink on yarn distribute it’s jars  and launch the AppMaster will cost at least 
10s, if we don't put these jars in HDFS in advance.

> Make the default value of attemptFailuresValidityInterval more reasonable
> -------------------------------------------------------------------------
>
>                 Key: FLINK-17871
>                 URL: https://issues.apache.org/jira/browse/FLINK-17871
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>            Reporter: fanxin
>            Priority: Minor
>
> Default value of `yarn.application-attempt-failures-validity-interval` is 
> `10000` milliseconds at present. Usually preparing the context alone can take 
> seconds, which means that default value of 10000 is too small to even prepare 
> the runtime context. With a default config, a flink on yarn job in will 
> hardly meet the condition of ”fail 2 times in 10s“. If the job has some 
> internal problems, unfortunately, it can easily get bogged down in endless 
> retries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-17871) Make the default value of attemptFailuresValidityInterval more reasonable

Reply via email to