[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973825#comment-14973825
 ] 

Allen Wittenauer commented on HDFS-7984:
----------------------------------------

bq. No, I think we have. When using existing Credentials#writeTokenStorageFile 
... (a bunch of other verbage)

This demonstrates the big disconnect between what we see and what our users 
see. 

You don't seriously expect some data scientist or ops person to write code for 
this, do you?  Yes, there's an API, but where are the command line utilities to 
use it?  Where's the example code? Oh that's right, we expect everyone to build 
their own utilities.  Is it because the APIs are the only thing that ever stay 
stable?  Unless we switch Java versions in the middle of a branch. Or, I guess, 
at least until we move the classes out of jars.  Or, ...

(... and let's not forget that this is in some of the LEAST user-friendly bits 
of the source.  Even long time Hadoop devs shudder in fear when dealing with 
the UGI and token code ...)

bq. Back to the original purpose of the JIRA, I don't know why we need to 
specify multiple delegation tokens in one webhdfs://, the delegation token is 
used in some service to access HDFS on behalf of user, so one hdfs only needs 
one delegation token for one user.

I think you're greatly simplifying the situation.  In our use cases, we almost 
always have multiple realms in play where cross-realm is not and cannot be 
configured. We also don't trust our jobs to work with the given HDFS JARs since 
Hadoop backward compatibility is pretty much a joke at this point.  (See above) 
So there are often two WebHDFS URLs given on the distcp command line.

It's also not unusual to have a *third* cluster in play to act as an 
intermediary.  So yes, there are definitely real world use cases where 
supplying multiple DTs are needed.

bq. user specify delegation token in each webhdfs://,

... which, today, the only way a user can do this is via 
HADOOP_TOKEN_FILE_LOCATION... which I think everyone agrees is pretty terrible. 
Of course,  that's after they build an application to actually create a file 
with multiple tokens.  

bq.  We also should not use HADOOP_TOKEN_FILE_LOCATION to solve the problem.

... which ultimately brings us back to this and a handful of other patches 
we're working on.

> webhdfs:// needs to support provided delegation tokens
> ------------------------------------------------------
>
>                 Key: HDFS-7984
>                 URL: https://issues.apache.org/jira/browse/HDFS-7984
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 3.0.0
>            Reporter: Allen Wittenauer
>            Assignee: HeeSoo Kim
>            Priority: Blocker
>         Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to