[jira] [Commented] (SPARK-20608) Standby namenodes should be allowed to included in yarn.spark.access.namenodes to support HDFS HA

Yuechen Chen (JIRA) Thu, 11 May 2017 19:59:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007561#comment-16007561
 ]


Yuechen Chen commented on SPARK-20608:
--------------------------------------

I try this solution, but meet with some problems.
I configging dfs.nameservices in hdfs-site.xml in my test machine, and hadoop 
client works: hdfs dfs -ls hdfs://mycluster/path
But by spark-submit, it failed by following exception.
17/05/12 10:33:57 INFO Client: Submitting application 
application_1487208985618_23772 to ResourceManager
17/05/12 10:33:59 INFO Client: Application report for 
application_1487208985618_23772 (state: FAILED)
17/05/12 10:33:59 INFO Client: 
         client token: N/A
         diagnostics: Unable to map logical nameservice URI 'hdfs://mycluster' 
to a NameNode. Local configuration does not have a failover proxy provider 
configured.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
Should make the same nameservices configged in YARN, which means the remote 
nameservice should also config in resource manager in YARN?
I'm not so clearly about that.
Since putting the namespace address is the only recommended solution to support 
HDFS, may someone solve this problem(if it's a bug) or give some examples in 
SPARK wiki?

> Standby namenodes should be allowed to included in 
> yarn.spark.access.namenodes to support HDFS HA
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20608
>                 URL: https://issues.apache.org/jira/browse/SPARK-20608
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Submit, YARN
>    Affects Versions: 2.0.1, 2.1.0
>            Reporter: Yuechen Chen
>            Priority: Minor
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> If one Spark Application need to access remote namenodes, 
> yarn.spark.access.namenodes should be only be configged in spark-submit 
> scripts, and Spark Client(On Yarn) would fetch HDFS credential periodically.
> If one hadoop cluster is configured by HA, there would be one active namenode 
> and at least one standby namenode. 
> However, if yarn.spark.access.namenodes includes both active and standby 
> namenodes, Spark Application will be failed for the reason that the standby 
> namenode would not access by Spark for org.apache.hadoop.ipc.StandbyException.
> I think it won't cause any bad effect to config standby namenodes in 
> yarn.spark.access.namenodes, and my Spark Application can be able to sustain 
> the failover of Hadoop namenode.
> HA Examples:
> Spark-submit script: 
> yarn.spark.access.namenodes=hdfs://namenode01,hdfs://namenode02
> Spark Application Codes:
> dataframe.write.parquet(getActiveNameNode(...) + hdfsPath)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20608) Standby namenodes should be allowed to included in yarn.spark.access.namenodes to support HDFS HA

Reply via email to