[
https://issues.apache.org/jira/browse/FLINK-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538463#comment-16538463
]
陈梓立 commented on FLINK-9779:
----------------------------
[~aljoscha] yes, current behavior is replace internal timeout with external
one, but this timeout mechanism could be entirely remove. The other issue(9778)
is quite a little muddled and I mistake it as a duplicate – it's relevant, but
not duplicate
> Remove SlotRequest timeout
> --------------------------
>
> Key: FLINK-9779
> URL: https://issues.apache.org/jira/browse/FLINK-9779
> Project: Flink
> Issue Type: Improvement
> Components: JobManager, ResourceManager, TaskManager
> Reporter: 陈梓立
> Priority: Major
>
> As is involved in FLINK-8643 and FLINK-8653, we use external timeout to
> replace internal timeout of slot request. Follow the question: why not
> entirely remove this timeout mechanism? In our industrial case, this timeout
> mechanism causes more no-needed fail and makes resource allocation inaccurate.
> I would propose to get rid of slot request timeout. Instead, we handle TM
> fail in RM where properly cancel pending request and if TM cannot offer slot
> to JM, we introduce a blacklist mechanism to nudge RM realloc for pending
> request.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)