[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

Saisai Shao (JIRA) Wed, 28 Oct 2015 17:53:47 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979586#comment-14979586
 ]


Saisai Shao commented on SPARK-2089:
------------------------------------

Hi [~sandyr] and [~tgraves], thank a lot for your comments. I'm not saying 
which one is better, dynamic allocation or static allocation. I'm saying that 
locality hints cannot be specified in static allocation, this is a missing 
part, unless dynamic allocation is the only choice, there's always a gap 
between this two choices.

>From my own experience, I will only enable dynamic allocation for long running 
>services, not for batch workloads, especially for benchmark, in which I need 
>to well control the container number. I'm not sure what is the usage pattern 
>of most users, will they enable dynamic allocation for all cases. If so I 
>think it is not necessary to address this issue. So here I'd like to collect 
>some feedbacks, to know whether we're necessary to address this problem.

> With YARN, preferredNodeLocalityData isn't honored 
> ---------------------------------------------------
>
>                 Key: SPARK-2089
>                 URL: https://issues.apache.org/jira/browse/SPARK-2089
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Critical
>
> When running in YARN cluster mode, apps can pass preferred locality data when 
> constructing a Spark context that will dictate where to request executor 
> containers.
> This is currently broken because of a race condition.  The Spark-YARN code 
> runs the user class and waits for it to start up a SparkContext.  During its 
> initialization, the SparkContext will create a YarnClusterScheduler, which 
> notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
> immediately fetches the preferredNodeLocationData from the SparkContext and 
> uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData, 
> setting preferredNodeLocationData comes after the rest of the initialization, 
> so, if the Spark-YARN code comes around quickly enough after being notified, 
> the data that's fetched is the empty unset version.  The occurred during all 
> of my runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

Reply via email to