[
https://issues.apache.org/jira/browse/FLINK-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192205#comment-17192205
]
Husky Zeng commented on FLINK-16595:
------------------------------------
[~rmetzger]
Hi Robert,
Thanks for your help , I`m very glad to work for this issue.
1.Nameservice is a concept in HDFS, as the document say ,when we have a big
cluster of hdfs ,we need to rely on multiple nameservice. In my production
environment, we will expand one nameservice per 100 million files.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html
2.If there are multiple nameservices in hdfs cluster , when we submit a
flink job, yarn will collect logs from all of those nameservices. When we
enable security ,the token will be only set to default one nameservices path,
and ohters nameservices in our cluster will deny yarn's access request.
here is the code ,which only set token to the default one nameservices path ,
and my solution is to set token to all of those nameservice path .
https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L975
By the way,this is the first time I prepare for a PR,could you tell me which
branch should I choose? If I make a mistake ,please tell me directly , I shall
be very grateful !
Best Wishes,
Husky Zeng
> Support extra hadoop filesystem URLs for which to request delegation tokens
> ---------------------------------------------------------------------------
>
> Key: FLINK-16595
> URL: https://issues.apache.org/jira/browse/FLINK-16595
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Affects Versions: 1.10.0
> Reporter: fa zheng
> Assignee: Husky Zeng
> Priority: Major
> Fix For: 1.12.0
>
>
> When the cluster has multiple nameservices, the client only can obtain the
> token of default nameservice. We should add an configuration in
> YarnConfigOptions to obtain all nameservices refer to
> spark.yarn.access.hadoopFileSystems.
> [yarn-specific-kerberos-configuration|https://spark.apache.org/docs/latest/running-on-yarn.html#yarn-specific-kerberos-configuration].
> I encountered this problem when the directory of yarn logs is in another
> nameservice. It will lead a long waiting for application to run and the log
> can not be aggregated finally.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)