[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872542#comment-13872542
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5663:
-----------------------------------------------

bq. ... I’m not too sure about - mainly from the perspective of services not 
handling getToken requests correctly if security is disabled

We are moving away from this, in Yarn we always use tokens, regardless of the 
security configuration. Oozie needs tokens to be there in order to work 
correctly.

bq. ... The JobClient currently doesn't do this, at least for HDFS.

Actually, yes it does do this if you set the {{MRJobConfig.JOB_NAMENODES}} 
property, this is done in the {{JobSubmitter#populateTokenCache()}} method 
which is called by {{JobSubmitter#submitJobInternal()}} which is called by 
{{JobSubmitter#submit()}}. All this is done in the main execution path, thus 
always done when doing a submit. It is independent of split computations.

bq. ... For HBase / HCatalog sources which are outside of the IF/OF for a MR 
job - I don't think we have the capability for fetching tokens, and rely on the 
user providing them up front.

Actually, we are fetching them upfront only because this was needed for MR 
jobs, but MR shouldn’t be a special case. Oozie has the concept of 
{{CredentialsProvider}} for this very same reason. And I think with this JIRA 
we can fix this in a general case.

bq. ... Would this utility class know how to handle all kinds of URIs ?

Yes, based on registered handlers for different schemes, more on this follows.


My thinking on how to address this is to use the same pattern we are doing 
today for loading/registering {{FileSystem}}, {{CompressionCodec}}, 
{{TokenRenewers}}, {{SecurityInfo}} implementations. Using JDK’s 
{{ServiceLoader}} mechanism to load all available implementations of the 
following interface:

{code}
/**
 * Implementations must be thread-safe.
 */
public interface CredentialsProvider {

 /**
  * Reports the scheme being supported by this provider.
  */
 public String getScheme();

 /**
  * Obtains delegations tokens for the provided URIs.
  *
  * @param conf configuration used to initialize the components that connect to 
the specified URIs.
  * @param uris URIs of services to obtain delegation tokens from.
  * @ param targetCredentials credentials to add the fetched delegation tokens.
  */
 public void obtainCredentials(Configuration conf, URI[] uris, Credentials 
targetCredentials) throws IOException;
{code}

Then we would have a {{CredentialsProvider}} class that would use a 
{{ServiceLoader}} to load all credentials available in the classpatch (via the 
ServiceLoader mechanism, the nice thing about this is that you drop a JAR file 
with a service implementation and you don’t have to configure anything, it just 
works provided you have the META-INF/services/... file for it). This would be 
done in a class static block initialization.

the {{CredentialsProvider}} would have a static method 
{{fetchCredentials(Configuration, URI[], Credentials)}} which sorts out the 
URIs by scheme and then invokes the corresponding {{CredentialsProvider}} impl 
for it.

Then the different Yarn applications define a property in the conf to indicate 
the URIs of the services to get tokens and their client submission code does it 
(like the {{JobSubmitter}} does with {{MRJobConfig.JOB_NAMENODES}} but in a 
general way. Frameworks may chose to be smarter (in the case of MR get the URIS 
from the splits an the output dir and get the tokens automatically).


> Add an interface to Input/Ouput Formats to obtain delegation tokens
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5663
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Michael Weng
>         Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, 
> MAPREDUCE-5663.6.txt, MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, 
> MAPREDUCE-5663.patch.txt3
>
>
> Currently, delegation tokens are obtained as part of the getSplits / 
> checkOutputSpecs calls to the InputFormat / OutputFormat respectively.
> This works as long as the splits are generated on a node with kerberos 
> credentials. For split generation elsewhere (AM for example), an explicit 
> interface is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to