[ 
https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847854#comment-16847854
 ] 

Dhruve Ashar commented on SPARK-24149:
--------------------------------------

[~mgaido] [~vanzin],

The change this PR introduced is trying to explicitly figure out the list of 
namenodes from the hadoop configs. I think we are duplicating the logic here 
and this makes it confusing to understand as the client should be transparent 
to figuring out the necessary namenodes.

 

Rationale:

- HDFS Federation is used to store the data from two different namespaces on 
the same data node (mostly used with unrelated namespaces).

- ViewFS on the other hand is used for better namespace management by having 
different namespaces on different namenodes. But in that case, you should 
always be using it with viewfs:// which takes care of getting the tokens for 
you. (Note: this may use HDFS federation or may be not).

In either case we should rely on hadoop to give us the requested namenodes.

In the use case where we want to access unrelated namespaces (often used in 
scenarios where different hive tables are stored in different namespaces), we 
already have a config to pass in the other namenodes and we really don't need 
this change.

 

There was a follow-up PR to fix an issue because of this behavior to get the FS 
only for the specified namenodes. IMHO both of these changes are unnecessary 
and we should revert them to the original behavior.

 

Thoughts, comments?

> Automatic namespaces discovery in HDFS federation
> -------------------------------------------------
>
>                 Key: SPARK-24149
>                 URL: https://issues.apache.org/jira/browse/SPARK-24149
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.4.0
>            Reporter: Marco Gaido
>            Assignee: Marco Gaido
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> Hadoop 3 introduced HDFS federation.
> Spark fails to write on different namespaces when Hadoop federation is turned 
> on and the cluster is secure. This happens because Spark looks for the 
> delegation token only for the defaultFS configured and not for all the 
> available namespaces. A workaround is the usage of the property 
> {{spark.yarn.access.hadoopFileSystems}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to