[ 
https://issues.apache.org/jira/browse/FLINK-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538463#comment-16538463
 ] 

陈梓立 commented on FLINK-9779:
----------------------------

[~aljoscha] yes, current behavior is replace internal timeout with external 
one, but this timeout mechanism could be entirely remove. The other issue(9778) 
is quite a little muddled and I mistake it as a duplicate – it's relevant, but 
not duplicate

> Remove SlotRequest timeout
> --------------------------
>
>                 Key: FLINK-9779
>                 URL: https://issues.apache.org/jira/browse/FLINK-9779
>             Project: Flink
>          Issue Type: Improvement
>          Components: JobManager, ResourceManager, TaskManager
>            Reporter: 陈梓立
>            Priority: Major
>
> As is involved in FLINK-8643 and  FLINK-8653, we use external timeout to 
> replace internal timeout of slot request. Follow the question: why not 
> entirely remove this timeout mechanism? In our industrial case, this timeout 
> mechanism causes more no-needed fail and makes resource allocation inaccurate.
> I would propose to get rid of slot request timeout. Instead, we handle TM 
> fail in RM where properly cancel pending request and if TM cannot offer slot 
> to JM, we introduce a blacklist mechanism to nudge RM realloc for pending 
> request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to