[ 
https://issues.apache.org/jira/browse/APEXCORE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126034#comment-16126034
 ] 

Sanjay M Pujare commented on APEXCORE-777:
------------------------------------------

This should be addressed as part of refactor JIRA APEXCORE-771. 

When Yarn returns 2 containers, the code processes the first allocated 
container and because it is obviously not "already allocated" it does not touch 
the counters and then removes the outstanding request from requestedResources 
Map. But later it determines the container is not needed anymore so it creates 
a release-request for that container. While processing the second 
allocated-container it cannot determine this to be "already allocated" case 
because the request was removed from the requestedResources map and so 
numRequestedContainers doesn't get incremented. It again figures out this 
container is not needed so creates a release-request for this container. But 
then numRequestedContainers stays at -1 and that is the problem.

Basically a request should NEVER be removed from the requestedResources map, so 
that the code can ascertain "already allocated" cases even in cases such as 
this. We should have additional flags/states in the map to denote a request as 
"removed" (with reason for the removal) so it is possible to match later 
allocations against these "removed" requests.

> Application Master may not shutdown due to incorrect numRequesteContainers 
> counting
> -----------------------------------------------------------------------------------
>
>                 Key: APEXCORE-777
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-777
>             Project: Apache Apex Core
>          Issue Type: Bug
>            Reporter: Vlad Rozov
>            Priority: Minor
>
> Consider a scenario where App master requests a container from Yarn 
> (numRequestedContainers = 1). There is not enough resources and the request 
> timeouts. My understanding is that App master will re-request it again but 
> the number of requested containers will not change (one newly requested, one 
> removed). Let's assume that App master, by the time Yarn responds back 
> decides that it does not need any. If Yarn responds with one allocated 
> containers, numRequestedContainers will go to 0 (correct), but Yarn may 
> respond back with 2 allocated containers if by the time App Master sends the 
> second request it already allocated a container requested in the original 
> request (the one that timeouted) as Yarn does not guarantee that removed 
> request is fullfilled (see Yarn doc). Will not in this case 
> numRequestedContainers be -1 due to the bulk decrement?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to