[jira] [Commented] (FLINK-16595) Support extra hadoop filesystem URLs for which to request delegation tokens

Husky Zeng (Jira) Tue, 08 Sep 2020 06:26:20 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192205#comment-17192205
 ]


Husky Zeng commented on FLINK-16595:
------------------------------------

[~rmetzger]

Hi Robert,

Thanks for your help , I`m very glad to work for this issue. 

1.Nameservice is a concept in HDFS,   as the document say ，when we have a big 
cluster of hdfs ,we need to  rely on  multiple nameservice. In my  production 
environment, we will expand one  nameservice per 100 million files.   
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html
   

2.If  there are multiple nameservices in  hdfs cluster ,  when we submit a 
flink job, yarn will collect logs from all of those nameservices. When we 
enable security ,the token will be only set to default one nameservices path, 
and ohters nameservices in our cluster will deny yarn's access request.  

here is the code ,which only set  token to the  default one nameservices path , 
and my solution is to set token to all of those nameservice path .
https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L975

By the way,this is the first time I prepare for a PR,could you tell me which 
branch should I choose? If I make a mistake ,please tell me directly , I shall  
be very grateful !

Best Wishes,

Husky Zeng

> Support extra hadoop filesystem URLs for which to request delegation tokens
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-16595
>                 URL: https://issues.apache.org/jira/browse/FLINK-16595
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.10.0
>            Reporter: fa zheng
>            Assignee: Husky Zeng
>            Priority: Major
>             Fix For: 1.12.0
>
>
> When the cluster has multiple nameservices, the client only can obtain the 
> token of default nameservice. We should add an configuration in 
> YarnConfigOptions to obtain all nameservices refer to 
> spark.yarn.access.hadoopFileSystems. 
> [yarn-specific-kerberos-configuration|https://spark.apache.org/docs/latest/running-on-yarn.html#yarn-specific-kerberos-configuration].
> I encountered this problem when the directory of yarn logs is in another 
> nameservice.  It will lead a long waiting for application to run and the log 
> can not  be aggregated finally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-16595) Support extra hadoop filesystem URLs for which to request delegation tokens

Reply via email to