[ 
https://issues.apache.org/jira/browse/FLINK-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

陈梓立 updated FLINK-9779:
-----------------------
    Description: As is involved in FLINK-8643 and  FLINK-8653, we use external 
timeout to replace internal timeout of slot request. Follow the question: why 
not entirely remove this timeout mechanism? In our industrial case, this 
timeout mechanism causes more no-needed fail and makes resource allocation 
inaccurate.  (was: As is involved in FLINK-8643 and  FLINK-8653, we use 
external timeout to replace internal timeout of slot request. Follow the 
question: why not entirely remove this timeout mechanism? In our industrial 
case, this timeout mechanism causes more no-needed fail and makes resource 
allocation inaccurate.

I would propose to get rid of slot request timeout. Instead, we handle TM fail 
in RM where properly cancel pending request and if TM cannot offer slot to JM, 
we introduce a blacklist mechanism to nudge RM realloc for pending request.)

> Remove SlotRequest timeout
> --------------------------
>
>                 Key: FLINK-9779
>                 URL: https://issues.apache.org/jira/browse/FLINK-9779
>             Project: Flink
>          Issue Type: Improvement
>          Components: JobManager, ResourceManager, TaskManager
>            Reporter: 陈梓立
>            Priority: Major
>
> As is involved in FLINK-8643 and  FLINK-8653, we use external timeout to 
> replace internal timeout of slot request. Follow the question: why not 
> entirely remove this timeout mechanism? In our industrial case, this timeout 
> mechanism causes more no-needed fail and makes resource allocation inaccurate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to