[
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-2089:
-----------------------------------
Target Version/s: 1.0.0, 1.0.1
> With YARN, preferredNodeLocalityData isn't honored
> ---------------------------------------------------
>
> Key: SPARK-2089
> URL: https://issues.apache.org/jira/browse/SPARK-2089
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.0.0
> Reporter: Sandy Ryza
>
> When running in YARN cluster mode, apps can pass preferred locality data when
> constructing a Spark context that will dictate where to request executor
> containers.
> This is currently broken because of a race condition. The Spark-YARN code
> runs the user class and waits for it to start up a SparkContext. During its
> initialization, the SparkContext will create a YarnClusterScheduler, which
> notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then
> immediately fetches the preferredNodeLocationData from the SparkContext and
> uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData,
> setting preferredNodeLocationData comes after the rest of the initialization,
> so, if the Spark-YARN code comes around quickly enough after being notified,
> the data that's fetched is the empty unset version. The occurred during all
> of my runs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)