[jira] [Commented] (SPARK-20608) Standby namenodes should be allowed to included in yarn.spark.access.namenodes to support HDFS HA

Yuechen Chen (JIRA) Wed, 10 May 2017 07:37:17 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004778#comment-16004778
 ]


Yuechen Chen commented on SPARK-20608:
--------------------------------------

I know what you mean and that's exactly right. 
But since Spark provide "yarn.spark.access.namenodes" config, Spark may 
recommend two ways to support saving data to remote HDFS:
1) as you said, by config remote namespace mapping in hdfs-site.xml, and just 
submit it to Spark without any SparkConf.(may be partly recommended for HA)
2) by config yarn.spark.access.namenodes=remotehdfs.(may support HA not well)
For the second way,  if standby namenodes is allowed to be include in 
yarn.spark.access.namenodes, this is easier way to make HA, even though Spark 
App may still failed if namenode failover during the job of saving to remote 
HDFS.

> Standby namenodes should be allowed to included in 
> yarn.spark.access.namenodes to support HDFS HA
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20608
>                 URL: https://issues.apache.org/jira/browse/SPARK-20608
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Submit, YARN
>    Affects Versions: 2.0.1, 2.1.0
>            Reporter: Yuechen Chen
>            Priority: Minor
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> If one Spark Application need to access remote namenodes, 
> yarn.spark.access.namenodes should be only be configged in spark-submit 
> scripts, and Spark Client(On Yarn) would fetch HDFS credential periodically.
> If one hadoop cluster is configured by HA, there would be one active namenode 
> and at least one standby namenode. 
> However, if yarn.spark.access.namenodes includes both active and standby 
> namenodes, Spark Application will be failed for the reason that the standby 
> namenode would not access by Spark for org.apache.hadoop.ipc.StandbyException.
> I think it won't cause any bad effect to config standby namenodes in 
> yarn.spark.access.namenodes, and my Spark Application can be able to sustain 
> the failover of Hadoop namenode.
> HA Examples:
> Spark-submit script: 
> yarn.spark.access.namenodes=hdfs://namenode01,hdfs://namenode02
> Spark Application Codes:
> dataframe.write.parquet(getActiveNameNode(...) + hdfsPath)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20608) Standby namenodes should be allowed to included in yarn.spark.access.namenodes to support HDFS HA

Reply via email to