[ 
https://issues.apache.org/jira/browse/SPARK-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2669:
-----------------------------
    Affects Version/s: 1.0.0

> Hadoop configuration is not localised when submitting job in yarn-cluster mode
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-2669
>                 URL: https://issues.apache.org/jira/browse/SPARK-2669
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Maxim Ivanov
>
> I'd like to propose a fix for a problem when Hadoop configuration is not 
> localized when job is submitted in yarn-cluster mode. Here is a description 
> from github pull request https://github.com/apache/spark/pull/1574
> This patch fixes a problem when Spark driver is run in the container
> managed by YARN ResourceManager it inherits configuration from a
> NodeManager process, which can be different from the Hadoop
> configuration present on the client (submitting machine). Problem is
> most vivid when fs.defaultFS property differs between these two.
> Hadoop MR solves it by serializing client's Hadoop configuration into
> job.xml in application staging directory and then making Application
> Master to use it. That guarantees that regardless of execution nodes
> configurations all application containers use same config identical to
> one on the client side.
> This patch uses similar approach. YARN ClientBase serializes
> configuration and adds it to ClientDistributedCacheManager under
> "job.xml" link name. ClientDistributedCacheManager is then utilizes
> Hadoop localizer to deliver it to whatever container is started by this
> application, including the one running Spark driver.
> YARN ClientBase also adds "SPARK_LOCAL_HADOOPCONF" env variable to AM
> container request which is then used by SparkHadoopUtil.newConfiguration
> to trigger new behavior when machine-wide hadoop configuration is merged
> with application specific job.xml (exactly how it is done in Hadoop MR).
> SparkContext is then follows same approach, adding
> SPARK_LOCAL_HADOOPCONF env to all spawned containers to make them use
> client-side Hadopo configuration.
> Also all the references to "new Configuration()" which might be executed
> on YARN cluster side are changed to use SparkHadoopUtil.get.conf
> Please note that it fixes only core Spark, the part which I am
> comfortable to test and verify the result. I didn't descend into
> steaming/shark directories, so things might need to be changed there too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to