[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039236#comment-14039236 ]
Mridul Muralidharan commented on SPARK-2089: -------------------------------------------- [~pwendell] SplitInfo is not from hadoop - but gives locality preference in context spark (see org.apache.spark.scheduler.SplitInfo) in a reasonably api agnostic way. The default support provided for it is hadoop specific based on dfs blocks - but I dont think there is anything stopping us from expressing other forms (either already currently or with minor modifications as applicable). We actually very heavily use that api - moving 10s or 100s of TB of data tends to be fairly expensive :-) Since we are still stuck in 0.9 + changes, have not yet faced this issue though, so great to see this being addressed. > With YARN, preferredNodeLocalityData isn't honored > --------------------------------------------------- > > Key: SPARK-2089 > URL: https://issues.apache.org/jira/browse/SPARK-2089 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.0.0 > Reporter: Sandy Ryza > Assignee: Sandy Ryza > Priority: Critical > > When running in YARN cluster mode, apps can pass preferred locality data when > constructing a Spark context that will dictate where to request executor > containers. > This is currently broken because of a race condition. The Spark-YARN code > runs the user class and waits for it to start up a SparkContext. During its > initialization, the SparkContext will create a YarnClusterScheduler, which > notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then > immediately fetches the preferredNodeLocationData from the SparkContext and > uses it to start requesting containers. > But in the SparkContext constructor that takes the preferredNodeLocationData, > setting preferredNodeLocationData comes after the rest of the initialization, > so, if the Spark-YARN code comes around quickly enough after being notified, > the data that's fetched is the empty unset version. The occurred during all > of my runs. -- This message was sent by Atlassian JIRA (v6.2#6252)