This is because Yarn's AM client does not remove fulfilled container request
from its MAP until the application's AM specifically calls
removeContainerRequest for fulfilled container requests.

Spark-1.2 : Spark's yarn AM does not call removeContainerRequest for
fulfilled container request.

Spark-1.3 : yarn AM calls removeContainerRequest for the container requests
it can map to be fulfilled. Tried the same test case of killing one executor
with spark-1.3 and the ask[] in this case was for 1 container.

As long as the cluster size is large enough to allocate the bloated
container requests, containers are sent to spark yarn allocator in allocate
response, spark yarn allocator uses missing number of container to launch
new executors and release the extra allocated containers.

The problem magnifies in case of long running jobs with large executor
memory requirements. In this case when ever a executor gets killed, the next
ask to yarn Resource manager (RM) is of n+1 containers (n being count of
already requested containers), which might be served by the RM if it still
has enough resources, else RM starts reserving cluster resources for a
containers which are not even required by spark in the first place. 

This causes resource crunch for other applications, and inefficient resource
utilization of cluster resources.

I was not able to find a thread talking specifically about this. Is there
any known use case due to which removeContainerRequest is not done in spark
1.2?  





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-2-yarn-allocator-does-not-remove-container-request-for-allocated-container-resulting-in-a-bl-tp13508.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to