[
https://issues.apache.org/jira/browse/FLINK-13555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann reassigned FLINK-13555:
-------------------------------------
Assignee: Xintong Song
> Failures of slot requests requiring unfulfillable managed memory should not
> be ignored.
> ---------------------------------------------------------------------------------------
>
> Key: FLINK-13555
> URL: https://issues.apache.org/jira/browse/FLINK-13555
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.9.0
> Reporter: Xintong Song
> Assignee: Xintong Song
> Priority: Blocker
> Fix For: 1.9.0
>
> Attachments: flink-unk-standalonesession-0-u-home.log,
> flink-unk-taskexecutor-0-u-home.log
>
>
> Currently, SlotPool ignores failures of requesting slots from ResourceManager
> for all batch slot requests. The idea behind this is to allow batch slot
> requests pending at SlotPool and waiting for other tasks to finish and
> release slots. A slot request will be failed only if it is not fulfilled in
> its timeout.
> However, there could be two kinds of request slots from RM failures.
> # RM does not have available slots. All slots are in use at the moment. But
> they might become available later when the currently running tasks finish.
> # The slot request requires too many resources that can not be fulfilled by
> any slot (available or not) in the cluster. The request is also not likely to
> be fulfilled later.
> For the 2nd kinds of failures, it doesn't make sense to wait for the timeout.
> We should fail the job immediately, with proper error messages describing the
> problem and suggesting the user to tune job or cluster configurations.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)