[jira] [Updated] (SPARK-13779) YarnAllocator cancels and resubmits container requests with no locality preference

Ryan Blue (JIRA) Wed, 09 Mar 2016 11:43:52 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ryan Blue updated SPARK-13779:
------------------------------
    Description: 
SPARK-9817 attempts to improve locality by considering the set of pending 
container requests. Pending requests with a locality preference that is no 
longer needed or no locality preference are cancelled, then resubmitted with 
updated locality preferences.

When running over data in S3, some stages have no locality information so the 
result is that the current logic cancels all pending requests and resubmits 
them (still without locality preferences) on every call to 
[{{updateResourceRequests()}}|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L273].

I propose the following update to avoid this problem:
# Cancel any pending requests with stale locality preferences
# Calculate N new requests, where N is the number of new containers + cancelled 
stale requests + outstanding requests without locality
# If the number of new requests with a locality preference was larger than the 
available count (new containers + cancelled stale requests), then cancel enough 
requests with no locality preference to be able to submit all of the requests 
with a locality preference.
# If the number of new requests with a locality preference is smaller than the 
available count then submit all of the locality requests and a request with no 
locality preference for the remaining available count. No pending requests with 
no locality are cancelled.

This strategy only cancels requests with no locality preference if a new 
request can be made that has a locality preference. Cancelling stale locality 
requests happens as it does today. I've tested this on large S3 jobs (50,000+ 
tasks) and it fixes the request thrashing problem.

  was:
SPARK-9817 attempts to improve locality by considering the set of pending 
container requests. Pending requests with a locality preference that is no 
longer needed or no locality preference are cancelled, then resubmitted with 
updated locality preferences.

When running over data in S3, some stages have no locality information so the 
result is that the current logic cancels all pending requests and resubmits 
them (still without locality preferences) on every call to 
[`updateResourceRequests()`|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L273].

I propose the following update to avoid this problem:
# Cancel any pending requests with stale locality preferences
# Calculate N new requests, where N is the number of new containers + cancelled 
stale requests + outstanding requests without locality
# If the number of new requests with a locality preference was larger than the 
available count (new containers + cancelled stale requests), then cancel enough 
requests with no locality preference to be able to submit all of the requests 
with a locality preference.
# If the number of new requests with a locality preference is smaller than the 
available count then submit all of the locality requests and a request with no 
locality preference for the remaining available count. No pending requests with 
no locality are cancelled.

This strategy only cancels requests with no locality preference if a new 
request can be made that has a locality preference. Cancelling stale locality 
requests happens as it does today. I've tested this on large S3 jobs (50,000+ 
tasks) and it fixes the request thrashing problem.


> YarnAllocator cancels and resubmits container requests with no locality 
> preference
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-13779
>                 URL: https://issues.apache.org/jira/browse/SPARK-13779
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.6.0
>            Reporter: Ryan Blue
>
> SPARK-9817 attempts to improve locality by considering the set of pending 
> container requests. Pending requests with a locality preference that is no 
> longer needed or no locality preference are cancelled, then resubmitted with 
> updated locality preferences.
> When running over data in S3, some stages have no locality information so the 
> result is that the current logic cancels all pending requests and resubmits 
> them (still without locality preferences) on every call to 
> [{{updateResourceRequests()}}|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L273].
> I propose the following update to avoid this problem:
> # Cancel any pending requests with stale locality preferences
> # Calculate N new requests, where N is the number of new containers + 
> cancelled stale requests + outstanding requests without locality
> # If the number of new requests with a locality preference was larger than 
> the available count (new containers + cancelled stale requests), then cancel 
> enough requests with no locality preference to be able to submit all of the 
> requests with a locality preference.
> # If the number of new requests with a locality preference is smaller than 
> the available count then submit all of the locality requests and a request 
> with no locality preference for the remaining available count. No pending 
> requests with no locality are cancelled.
> This strategy only cancels requests with no locality preference if a new 
> request can be made that has a locality preference. Cancelling stale locality 
> requests happens as it does today. I've tested this on large S3 jobs (50,000+ 
> tasks) and it fixes the request thrashing problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-13779) YarnAllocator cancels and resubmits container requests with no locality preference

Reply via email to