[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979586#comment-14979586 ]
Saisai Shao commented on SPARK-2089: ------------------------------------ Hi [~sandyr] and [~tgraves], thank a lot for your comments. I'm not saying which one is better, dynamic allocation or static allocation. I'm saying that locality hints cannot be specified in static allocation, this is a missing part, unless dynamic allocation is the only choice, there's always a gap between this two choices. >From my own experience, I will only enable dynamic allocation for long running >services, not for batch workloads, especially for benchmark, in which I need >to well control the container number. I'm not sure what is the usage pattern >of most users, will they enable dynamic allocation for all cases. If so I >think it is not necessary to address this issue. So here I'd like to collect >some feedbacks, to know whether we're necessary to address this problem. > With YARN, preferredNodeLocalityData isn't honored > --------------------------------------------------- > > Key: SPARK-2089 > URL: https://issues.apache.org/jira/browse/SPARK-2089 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.0.0 > Reporter: Sandy Ryza > Assignee: Sandy Ryza > Priority: Critical > > When running in YARN cluster mode, apps can pass preferred locality data when > constructing a Spark context that will dictate where to request executor > containers. > This is currently broken because of a race condition. The Spark-YARN code > runs the user class and waits for it to start up a SparkContext. During its > initialization, the SparkContext will create a YarnClusterScheduler, which > notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then > immediately fetches the preferredNodeLocationData from the SparkContext and > uses it to start requesting containers. > But in the SparkContext constructor that takes the preferredNodeLocationData, > setting preferredNodeLocationData comes after the rest of the initialization, > so, if the Spark-YARN code comes around quickly enough after being notified, > the data that's fetched is the empty unset version. The occurred during all > of my runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org