[ https://issues.apache.org/jira/browse/SPARK-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-13779: ------------------------------------ Assignee: Apache Spark > YarnAllocator cancels and resubmits container requests with no locality > preference > ---------------------------------------------------------------------------------- > > Key: SPARK-13779 > URL: https://issues.apache.org/jira/browse/SPARK-13779 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.6.0 > Reporter: Ryan Blue > Assignee: Apache Spark > > SPARK-9817 attempts to improve locality by considering the set of pending > container requests. Pending requests with a locality preference that is no > longer needed or no locality preference are cancelled, then resubmitted with > updated locality preferences. > When running over data in S3, some stages have no locality information so the > result is that the current logic cancels all pending requests and resubmits > them (still without locality preferences) on every call to > [{{updateResourceRequests()}}|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L273]. > I propose the following update to avoid this problem: > # Cancel any pending requests with stale locality preferences > # Calculate N new requests, where N is the number of new containers + > cancelled stale requests + outstanding requests without locality > # If the number of new requests with a locality preference was larger than > the available count (new containers + cancelled stale requests), then cancel > enough requests with no locality preference to be able to submit all of the > requests with a locality preference. > # If the number of new requests with a locality preference is smaller than > the available count then submit all of the locality requests and a request > with no locality preference for the remaining available count. No pending > requests with no locality are cancelled. > This strategy only cancels requests with no locality preference if a new > request can be made that has a locality preference. Cancelling stale locality > requests happens as it does today. I've tested this on large S3 jobs (50,000+ > tasks) and it fixes the request thrashing problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org