[
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973825#comment-14973825
]
Allen Wittenauer commented on HDFS-7984:
----------------------------------------
bq. No, I think we have. When using existing Credentials#writeTokenStorageFile
... (a bunch of other verbage)
This demonstrates the big disconnect between what we see and what our users
see.
You don't seriously expect some data scientist or ops person to write code for
this, do you? Yes, there's an API, but where are the command line utilities to
use it? Where's the example code? Oh that's right, we expect everyone to build
their own utilities. Is it because the APIs are the only thing that ever stay
stable? Unless we switch Java versions in the middle of a branch. Or, I guess,
at least until we move the classes out of jars. Or, ...
(... and let's not forget that this is in some of the LEAST user-friendly bits
of the source. Even long time Hadoop devs shudder in fear when dealing with
the UGI and token code ...)
bq. Back to the original purpose of the JIRA, I don't know why we need to
specify multiple delegation tokens in one webhdfs://, the delegation token is
used in some service to access HDFS on behalf of user, so one hdfs only needs
one delegation token for one user.
I think you're greatly simplifying the situation. In our use cases, we almost
always have multiple realms in play where cross-realm is not and cannot be
configured. We also don't trust our jobs to work with the given HDFS JARs since
Hadoop backward compatibility is pretty much a joke at this point. (See above)
So there are often two WebHDFS URLs given on the distcp command line.
It's also not unusual to have a *third* cluster in play to act as an
intermediary. So yes, there are definitely real world use cases where
supplying multiple DTs are needed.
bq. user specify delegation token in each webhdfs://,
... which, today, the only way a user can do this is via
HADOOP_TOKEN_FILE_LOCATION... which I think everyone agrees is pretty terrible.
Of course, that's after they build an application to actually create a file
with multiple tokens.
bq. We also should not use HADOOP_TOKEN_FILE_LOCATION to solve the problem.
... which ultimately brings us back to this and a handful of other patches
we're working on.
> webhdfs:// needs to support provided delegation tokens
> ------------------------------------------------------
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: webhdfs
> Affects Versions: 3.0.0
> Reporter: Allen Wittenauer
> Assignee: HeeSoo Kim
> Priority: Blocker
> Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the
> ability to inject a delegation token rather than webhdfs initialize its own.
> This would allow for cross-authentication-zone file system accesses.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)