[
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980717#comment-14980717
]
Sandy Ryza commented on SPARK-2089:
-----------------------------------
My opinion is that we should be moving towards dynamic allocation as the norm,
both for batch and long-running applications. With dynamic allocation turned
on, it's possible to attain close to the same behavior as static allocation if
you set max executors and a really fast ramp-up time.
> With YARN, preferredNodeLocalityData isn't honored
> ---------------------------------------------------
>
> Key: SPARK-2089
> URL: https://issues.apache.org/jira/browse/SPARK-2089
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.0.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Priority: Critical
>
> When running in YARN cluster mode, apps can pass preferred locality data when
> constructing a Spark context that will dictate where to request executor
> containers.
> This is currently broken because of a race condition. The Spark-YARN code
> runs the user class and waits for it to start up a SparkContext. During its
> initialization, the SparkContext will create a YarnClusterScheduler, which
> notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then
> immediately fetches the preferredNodeLocationData from the SparkContext and
> uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData,
> setting preferredNodeLocationData comes after the rest of the initialization,
> so, if the Spark-YARN code comes around quickly enough after being notified,
> the data that's fetched is the empty unset version. The occurred during all
> of my runs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]