Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hbase/HBaseTokenAuthentication" page has been changed by GaryHelmling.
The comment on this change is: Initial draft.
http://wiki.apache.org/hadoop/Hbase/HBaseTokenAuthentication

--------------------------------------------------

New page:
= HBase Token Authentication =
While HBase security now supports Kerberos authentication for client RPC 
connections, this is only part of the puzzle for integration with secure 
Hadoop.  Kerberos authentication is only used for direct client access to HDFS. 
 The Hadoop MapReduce framework instead uses a DIGEST-MD5 authentication 
scheme, where the client is granted a signed "delegation token" and secret 
"token authenticator" (the SHA1 hash of the delegation token and a NN secret 
key) when a MapReduce job is submitted.  The token and authenticator are 
serialized into a secure location in HDFS, so that the spawned Child processes 
can de-serialize the credentials and use them to re-authenticate to the NN as 
the submitting user.

Since Kerberos credentials are not used in the MapReduce task execution 
context, any client attempts to authenticate to HBase will fail.  As a result, 
HBase connections will need to support an alternate authentication scheme, 
similarly to the Hadoop MapReduce framework.

=== Goals ===
The main considerations for supporting map reduce authentication are:

 1. The implementation should avoid any changes to core Hadoop code.  Any 
changes in Hadoop will require a great deal more review and discussion to 
potentially be accepted, and would necessitate running a forked version of 
Hadoop for some time.
 1. Any changes should be transparent to existing map-reduce user code.  We 
shouldn't require any new APIs to be used for authentication, for example.
 1. Changes to the job submission process, such as using a wrapper or utility 
to submit map-reduce jobs, are preferable to any changes requiring code 
modifications

== HBase Authentication Tokens ==
While Hadoop user delegation tokens provide an existing means of Map``Reduce 
task authentication, their reliance on an secret key stored in memory on the 
Name``Node makes them inaccessible for authentication in HBase.  Fortunately, 
the Hadoop security implementation and Map``Reduce job submission and execution 
code provides a generalized framework for token handling.  Building on top of 
this, we can provide token based authentication from MR tasks to HBase without 
any core Hadoop or Map``Reduce changes.

=== Proposal: Adding an HBase user token ===
 1. extend {{{org.apache.hadoop.security.token.TokenIdentifier}}} with our own 
token implementation
 1. implement {{{org.apache.hadoop.security.token.SecretManager}}}
 1. master will generate a secret key for signing and authenticating tokens
   a. will need to persist somewhere (zookeeper?) to allow for master restarts 
and failover
   a. will need to distribute generated secret key to RS
     i. could be on region checkin/heartbeats, though stack is removing those
     i. could be distributed through zookeeper as well
 1. add a helper like {{{TableMapReduceUtil.initJob()}}} to use when submitting 
a new job
   a. will obtain a new token from master
   a. add token to Credentials instance
   a. normal {{{JobClient}}} code will serialize Credentials for MR job
 1. when running MR job, Credentials will be deserialized from secure location
   a. HBaseClient will look in credentials for any relevant tokens

==== Limitations ====
 1. Doesn't appear we'll be able to use the existing delegation token renew 
mechanism (but do we really need to do token renewal?)

=== Token ===
The HBase authentication token is modeled directly after the Hadoop user 
delegation token.  We have dropped support for a designated renewer, however, 
as we will not be able to support HBase token renewal without modification to 
core map reduce code.  The token will consist of:
 * Token``ID:
   1. Owner ID -- Username that this token will authenticate as
   1. Issue date -- timestamp (in msec) when this token was generated
   1. Expire date -- timestamp (in msec) at which this token expires
   1. Sequence -- to ensure uniqueness
 * Token``Authenticator := HMAC_SHA1(master key, Token``ID)
 * Authentication Token := (Token``ID, Token``Authenticator)

==== Authentication ====
HBase token authentication builds on top of DIGEST-MD5 authentication support 
provided by Hadoop RPC.  HBase token authentication follows the same process as 
Hadoop user delegation token authentication by the !NameNode:
 1. Client sends Token``ID to server
 1. Server uses Token``ID and the in-memory master secret key to regenerate 
Token``Authenticator
 1. Server validates Token``ID, checks for expiration
 1. Server and client then use Token``Authenticator as the shared secret to 
negotiate DIGEST-MD5 authentication

==== Master Secret Key ====
Authentication relies on a secret key generated at runtime on the master and 
used to generate Authentication Tokens for clients.  Tokens will be generated 
on the master for Kerberos authenticated clients, but token based 
authentication will need to be allowed on all masters and region servers in a 
cluster.  So the master will need a means to distribute the secret key to other 
cluster nodes.

The master will also need to write the secret key to persistent storage in 
order for authentication tokens to survive a cluster restart.

==== Implementation ====
 1. Extend {{{org.apache.hadoop.security.token.TokenIdentifier}}} with new 
HBase type
 1. Implement {{{org.apache.hadoop.security.token.TokenSelector}}} to pull out 
HBase type tokens
 1. Extend {{{org.apache.hadoop.security.token.SecretManager}}} with 
implementation to generate HBase tokens.  This will be used on HMaster to 
generate HBase tokens, and on HRegionServer to validate tokens for 
authentication.

=== Map Reduce Flow ===
For all of this to work without changes to Hadoop and MapReduce code, we have 
two key requirements:
 1. We must be able to add our own tokens to the MR job Credentials instance at 
job submission time (and the job must be able to serialize our token correctly 
with the rest of the job info)
 1. The Child task executing on each node must deserialize our token and add it 
to the {{{UserGroupInformation}}} instance so it can later be picked up by the 
HBase client for authentication

==== Job Submission ====
 1. Add a new utility class {{{SecureMapReduceUtil}}} with a static helper 
method, something like {{{void initAuthentication(Job job)}}}
   a. Call Master to obtain a new authentication token for the logged in user
     * Token will only be returned if user is authenticated via Kerberos, same 
as HDFS
   a. Add HBase token to job credentials -- 
{{{job.getCredentials().addToken(Text alias, Token)}}}
     * {{{FileSystem.getCanonicalServiceName()}}} is used as the alias for HDFS 
delegation tokens, what should we use?
 1. {{{Job.submit()}}} is later called normally, which should serialize token 
with the rest of the job credentials
   a. {{{JobTracker.submitJob()}}} receives the credentials via RPC and adds 
them to a {{{JobInProgress}}} instance added to the job queue
   a. Scheduler will write out the tokens when the job is run.  
{{{JobInProgress.initTasks()}}} -> {{{generateAndStoreTokens()}}} -> 
{{{Credentials.writeTokenStorageFile()}}}
   a. The serialized tokens will be written to {{{<jobdir>/jobToken}}}

==== Job Execution on Task Nodes ====
 1. On task start, {{{Child.main()}}} will read in a copy of the tokens from 
the local filesystem, local path passed as an env variable, read in using 
{{{TokenCache.loadTokens()}}}
 1. Each token is added to the child task {{{UserGroupInformation}}} instance 
used to run the local task
 1. Any HBase connections opened by the task will inherit the same UGI
 1. A {{{TokenInfo}}} annotation on the {{{HRegionInterface}}} and 
{{{HMasterInterface}}} protocol interfaces identifies the HBase 
{{{TokenSelector}}} implementation, which is then used to extract the relevant 
authentication token from the UGI's credentials
 1. Using the HBase authentication token, the authentication process proceeds 
as above

Reply via email to