[
https://issues.apache.org/jira/browse/APEXCORE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652496#comment-15652496
]
Venkatesh Kottapalli commented on APEXCORE-471:
-----------------------------------------------
In the issue scenario, there are other jobs using 177 containers from 180
containers in the cluster. When the Apex job is launched, it needs 20
containers and received the 3 remaining containers in the cluster initially.
After this, there is no request from the App master to RM to get the rest of
the 17 resources allocated and the job waits in pending state forever even
after the other jobs in the cluster got completed and all the containers are
available.
> Requests for container allocation are not resubmitted
> -----------------------------------------------------
>
> Key: APEXCORE-471
> URL: https://issues.apache.org/jira/browse/APEXCORE-471
> Project: Apache Apex Core
> Issue Type: Bug
> Affects Versions: 3.3.0, 3.4.0
> Reporter: Vlad Rozov
>
> When Yarn cluster has a limited number of available resources, requests
> should be resubmitted. BlacklistBasedResourceRequestHandler does not properly
> handle case when resources are limited.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)