[
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097826#comment-14097826
]
Sandy Ryza edited comment on SPARK-2089 at 8/14/14 10:41 PM:
-------------------------------------------------------------
Hmmmm, it's true that my suggestion would require us to serialize and then
immediately deserialize a possibly huge string. How about Spark conf
properties that just specify the input file and input format? We would handle
all the logic for converting this to location preferences on the other side.
This would also simplify things for the users (just need to set properties, not
call any methods).
was (Author: sandyr):
Hmmmm, it's true that my suggestion would require us to serialize and then
immediately deserialize a possibly huge string. How about Spark conf
properties that just specify the input file and input format, and handles all
the logic for converting this to location preferences on the other side. This
would also simplify things for the users (just need to set properties, not call
any methods).
> With YARN, preferredNodeLocalityData isn't honored
> ---------------------------------------------------
>
> Key: SPARK-2089
> URL: https://issues.apache.org/jira/browse/SPARK-2089
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.0.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Priority: Critical
>
> When running in YARN cluster mode, apps can pass preferred locality data when
> constructing a Spark context that will dictate where to request executor
> containers.
> This is currently broken because of a race condition. The Spark-YARN code
> runs the user class and waits for it to start up a SparkContext. During its
> initialization, the SparkContext will create a YarnClusterScheduler, which
> notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then
> immediately fetches the preferredNodeLocationData from the SparkContext and
> uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData,
> setting preferredNodeLocationData comes after the rest of the initialization,
> so, if the Spark-YARN code comes around quickly enough after being notified,
> the data that's fetched is the empty unset version. The occurred during all
> of my runs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]