Repository: oozie
Updated Branches:
  refs/heads/master 90c068574 -> ca72d4430


OOZIE-1853 Improve the Credentials documentation (rkanter)


Project: http://git-wip-us.apache.org/repos/asf/oozie/repo
Commit: http://git-wip-us.apache.org/repos/asf/oozie/commit/ca72d443
Tree: http://git-wip-us.apache.org/repos/asf/oozie/tree/ca72d443
Diff: http://git-wip-us.apache.org/repos/asf/oozie/diff/ca72d443

Branch: refs/heads/master
Commit: ca72d443012680e670e87e460583decd09ffd27c
Parents: 90c0685
Author: Robert Kanter <[email protected]>
Authored: Wed Sep 17 14:59:37 2014 -0700
Committer: Robert Kanter <[email protected]>
Committed: Wed Sep 17 14:59:37 2014 -0700

----------------------------------------------------------------------
 .../site/twiki/DG_ActionAuthentication.twiki    | 117 ++++++++++++
 .../twiki/DG_UnifiedCredentialsModule.twiki     | 187 -------------------
 docs/src/site/twiki/index.twiki                 |   2 +-
 release-log.txt                                 |   1 +
 4 files changed, 119 insertions(+), 188 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/oozie/blob/ca72d443/docs/src/site/twiki/DG_ActionAuthentication.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/DG_ActionAuthentication.twiki 
b/docs/src/site/twiki/DG_ActionAuthentication.twiki
new file mode 100644
index 0000000..e9a2358
--- /dev/null
+++ b/docs/src/site/twiki/DG_ActionAuthentication.twiki
@@ -0,0 +1,117 @@
+<noautolink>
+
+[[index][::Go back to Oozie Documentation Index::]]
+
+---+!! Action Authentication
+
+%TOC%
+
+---++ Background
+
+A secure cluster requires that actions have been authenticated (typically via 
Kerberos).  However, due to the way that Oozie runs
+actions, Kerberos credentials are not easily made available to actions 
launched by Oozie.  For many action types, this is not a
+problem because they are self contained (beyond core Hadoop components).  For 
example, a Pig action typically only talks to
+MapReduce and HDFS.  However, some actions require talking to external 
services (e.g. HCatalog, HBase Region Server, Hive Server 2)
+and in these cases, the actions require some extra configuration in Oozie to 
authenticate.  To be clear, this extra configuration
+is only required if an action will be talking to these types of external 
services; running a typical MapReduce, Pig, Hive, etc
+action will not require any of this.
+
+For these situations, Oozie will have to use its Kerberos credentials to 
obtain "delegation tokens" (think of it like a cookie) on
+behalf of the user from the service in question.  The details of what this 
means is beyond the scope of this documentation, but
+basically, Oozie needs some extra configuration in the workflow so that it can 
obtain this delegation token.
+
+---++ Oozie Server Configuration
+
+The code to obtain delegation tokens is pluggable so that it is easy to add 
support for different services by simply subclassing
+org.apache.oozie.action.hadoop.Credentials to retrieve a delegation token from 
the service and add it to the Configuration.
+
+Out of the box, Oozie already comes with support for some credential types
+(see [[DG_ActionAuthentication#Built-in_Credentials_Implementations][Built-in 
Credentials Implementations]]).
+The credential classes that Oozie should load are specified by the following 
property in oozie-site.xml.  The lefthand side of the
+equals sign is the type for the credential type, while the righthand side is 
the class.
+
+<verbatim>
+   <property>
+      <name>oozie.credentials.credentialclasses</name>
+      <value>
+         hcat=org.apache.oozie.action.hadoop.HCatCredentials,
+         hive=org.apache.oozie.action.hadoop.HbaseCredentials,
+         hive2=org.apache.oozie.action.hadoop.Hive2Credentials
+      </value>
+   </property>
+</verbatim>
+
+---++ Workflow Changes
+
+The user should add a =credentials= section to the top of their workflow that 
contains 1 or more =credential= sections.  Each of
+these =credential= sections contains a name for the credential, the type for 
the credential, and any configuration properties
+needed by that type of credential for obtaining a delegation token.  The 
=credentials= section is available in workflow schema
+version 0.3 and later.
+
+For example, the following workflow is configured to obtain an HCatalog 
delegation token, which is given to a Pig action so that the
+Pig action can talk to a secure HCatalog:
+
+<verbatim>
+   <workflow-app xmlns='uri:oozie:workflow:0.4' name='pig-wf'>
+      <credentials>
+         <credential name='my-hcat-creds' type='hcat'>
+            <property>
+               <name>hcat.metastore.uri</name>
+               <value>HCAT_URI</value>
+            </property>
+            <property>
+               <name>hcat.metastore.principal</name>
+               <value>HCAT_PRINCIPAL</value>
+            </property>
+         </credential>
+      </credentials>
+      ...
+      <action name='pig' cred='my-hcat-creds'>
+         <pig>
+            <job-tracker>JT</job-tracker>
+            <name-node>NN</name-node>
+            <configuration>
+               <property>
+                  <name>TESTING</name>
+                  <value>${start}</value>
+               </property>
+            </configuration>
+         </pig>
+      </action>
+      ...
+   </workflow-app>
+</verbatim>
+
+The type of the =credential= is "hcat", which is the type name we gave for the 
HCatCredentials class in oozie-site.xml.  We gave
+the =credential= a name, "my-hcat-creds", which can be whatever you want; we 
then specify cred='my-hcat-creds' in the Pig action,
+so that Oozie will include these credentials with the action.  You can include 
multiple credentials with an action by specifying
+a comma-separated list of =credential= names.  And finally, the 
HCatCredentials required two properties (the metastore URI and
+principal), which we also specified.
+
+---++ Built-in Credentials Implementations
+
+Oozie currently comes with the following Credentials implementations:
+
+   1. HCatalog and Hive Metastore: 
=org.apache.oozie.action.hadoop.HCatCredentials=
+   1. HBase: =org.apache.oozie.action.hadoop.HBaseCredentials=
+   1. Hive Server 2: =org.apache.oozie.action.hadoop.Hive2Credentials=
+
+HCatCredentials requires these two properties:
+
+   1. =hcat.metastore.principal=
+   1. =hcat.metastore.uri=
+
+*Note:* The HCatalog Metastore and Hive Metastore are one and the same and so 
the "hcat" type credential can also be used to talk
+to a secure Hive Metastore, though the property names would still start with 
"hcat.".
+
+HBase does not require any additional properties since the hbase-site.xml on 
the Oozie server provides necessary information to the
+obtain delegation token; though properties can be overwritten here if desired.
+
+Hive2Credentials requires these two properties:
+
+   1. =hive2.server.principal=
+   1. =hive2.jdbc.url=
+
+[[index][::Go back to Oozie Documentation Index::]]
+
+</noautolink>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/oozie/blob/ca72d443/docs/src/site/twiki/DG_UnifiedCredentialsModule.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/DG_UnifiedCredentialsModule.twiki 
b/docs/src/site/twiki/DG_UnifiedCredentialsModule.twiki
deleted file mode 100644
index 8503fa2..0000000
--- a/docs/src/site/twiki/DG_UnifiedCredentialsModule.twiki
+++ /dev/null
@@ -1,187 +0,0 @@
-<noautolink>
-
-[[index][::Go back to Oozie Documentation Index::]]
-
----+!! Unified Credentials Module for Oozie
-
-%TOC%
-
----++ Background
-
-Oozie is a workflow scheduling solution for pure Grid processing that needs to 
support the different job types existing in a Grid environment (M/R, PIG, 
Streaming, HDFS, etc.). This scheduling system is data aware, extensible, 
scalable and light-weight. As Oozie is envisioned as a geteway for the grid for 
all the batch processing needs, it has to be aware of all other data processing 
systems which are getting used or will be used in the future for these purposes.
-
-As Secure Hadoop is being used for the data processing then all components 
which have been built on hadoop will be using the same/different model for 
security needs and have their own security model to authenticate users. Now all 
the jobs are going through Oozie for hadoop and then for these systems, Oozie 
should be having a singular interface and support for different implementations 
of these credentials modules. Using those Oozie will authenticate users with 
all those systems and run job seamlessly.
-
-Lets take an Example, User has a system lets call it ABC, which he wants to 
use for running his job. Now it has same policy like hadoop for delegation 
token for running job or that system provides certificates for running that 
job. So user should have way to plugin their system's credentials policy in 
Oozie in order to run those jobs.
-
-This module facilitates users to provide credentials for any other systems 
user may want to use for running their jobs through Oozie if they follow the 
same interface and provide the implementation for those systems.
-
----++ Options
-
-We have couple of options for implementation that are as follows:
-
-   * Introduce separate actions ahead of all workflow applications which need 
specific authentication.
-   * Oozie will get credentials for user based on configuration in each action.
-   
-Following section will discuss about their pros and cons and why we chose the 
second option.
-
----++ Option 1 : Separate Actions for Credentials
-
-In this option Oozie would have introduced multiple authentication actions and 
User will be using those actions ahead of their workflows to first get all the 
necessary credentials and pass those credentials to all the underneath actions 
in the workflows. For Example if user wants to use M/R actions and Pig Actions 
using ABC system then they first need to add ABC Action ahead of MR and Pig 
Actions and then oozie server will run ABC action on the gateway(oozie server) 
and provide all the necessary credentials to following actions.
-
----+++ Shortcomings
-
-This is a nice approach however there are couple of shortcomings with this 
approach those are as follows.
-
-   * In this approach, there would only be one delegation token for all the 
actions in the workflow. However, if workflows have long running actions then 
that token has a potential problem of expiration because of which all the 
subsequent actions will fail due to authentication reason. The one solution to 
this approach is to add more time out which is a static number and will be 
configured at the workflow level (if interface is exposed from underneath 
system; if not then cant be done this way). which will add more load to the 
underneath authenticator servers in case of short running actions.
-   * There is another overhead of running one extra action per workflow.
-   
----++ Option 2 : Getting Credentials in each action
-
-The solution to above mentioned problem is to make each action responsible for 
its own needs, in this case credential token for different systems. Currently 
too it is implemented in such a way for name node and job tracker. Every 
actions gets the token for itself for hdfs.
-
-In this approach user will provide configuration for each workflow for all the 
needed/available credentials modules as well as user will also provide for each 
action, what are the credentials needed. Every action before running will call 
the appropriate credential modules to get the tokens and pass them in job conf 
for the tasks.
-
----+++ Shortcomings
-
-Shortcoming to this approach is every action has to authenticate itself but as 
of now there is no other way we can avoid that because of Token expiration 
problem. Perhaps one workflow may now authenticate many times with the same 
service, and that puts load on the auth service. There could be a 
de-authentication step after the action finishes in the future, if this turns 
out to be a problem.
-
----+++ Assumptions
-
-We have one assumption in this approach which is to pass the delegation tokens 
in the job conf. Without jobconf this approach will not work. However we use 
jobconf for passing the Namenode and Jobtracker token . So without jobconf we 
need to rearchitect that design as well. For now its safe to assume we will 
have job conf.
-
----++ User Interface Changes
-
-User has to add following configuration to their workflow.xml. Please find 
below workflow xml for reference.
-
-<verbatim>
-   <workflow-app xmlns='uri:oozie:workflow:0.1' name='pig-wf'>
-      ...
-      <credentials>
-           <credential name='howlauth' type='hcat'>
-             <property>
-               <name>hcat.metastore.uri</name>
-               <value>HCAT_URI</value>
-             </property>
-             <property> 
-               <name>hcat.metastore.principal</name>
-               <value>HCAT_PRINCIPAL</value>
-             </property>
-           </credential>
-         </credentials>
-         ...
-      <action name='pig' cred='howlauth'>
-        <pig>
-          <job-tracker>JT</job-tracker>
-          <name-node>NN</name-node>
-          <configuration>
-             <property>
-                <name>TESTING</name>
-                <value>${start}</value>
-             </property>
-          </configuration>
-        </pig>
-      </action>
-      ...
-   </workflow-app>
-</verbatim>
-
----++ Using the Unified Credentials Module
-
-If User wants to plugin the new Authentication module for their needs, they 
have to specify that in oozie-site.xml under the
-following property oozie.credentials.credentialclasses with a value of (for 
example)
-ABC=org.apache.oozie.action.hadoop.InsertTestToken
-
-<verbatim>
-   <property>
-      <name>oozie.credentials.credentialclasses</name>
-      <value>ABC=org.apache.oozie.action.hadoop.InsertTestToken</value>
-   </property>
-</verbatim>
-
----+++ Sample Insert Token class implementation
-
-This is the sample class how users can write their Token class
-
-<verbatim>
-public class InsertTestToken extends Credentials {
-.
-public InsertTestToken() {
-  }
-@Override
-public void addtoJobConf(JobConf jobconf, CredentialsProperties props, Context 
context) throws Exception {
-    try {
-        Token<DelegationTokenIdentifier> abctoken = new 
Token<DelegationTokenIdentifier>();
-        jobconf.getCredentials().addToken(new Text("ABC Token"), abctoken);
-        XLog.getLog(getClass()).debug("Added the ABC token in job conf");
-    }
-    catch (Exception e) {
-        XLog.getLog(getClass()).warn("Exception in addtoJobConf", e);
-        throw e;
-    }
-  }
-}
-</verbatim>
-
-This could then be used in a workflow as follows:
-
-<verbatim>
-   <workflow-app xmlns='uri:oozie:workflow:0.1' name='pig-wf'>
-      ...
-      <credentials>
-           <credential name='myauth' type='ABC'>
-             <property>
-               <name>property.for.my.auth</name>
-               <value>some_value</value>
-             </property>
-           </credential>
-         </credentials>
-         ...
-      <action name='pig' cred='myauth'>
-        <pig>
-          <job-tracker>JT</job-tracker>
-          <name-node>NN</name-node>
-          ...
-        </pig>
-      </action>
-      ...
-   </workflow-app>
-</verbatim>
-
----++ Built-in Credentials Implementations
-
-Oozie currently comes with two Credentials implementations:
-
-   1. HCatalog and Hive Metastore: 
=org.apache.oozie.action.hadoop.HCatCredentials=
-   1. HBase: =org.apache.oozie.action.hadoop.HBaseCredentials=
-   1. Hive Server 2: org.apache.oozie.action.hadoop.Hive2Credentials
-
-HCatCredentials requires these two properties:
-
-   1. =hcat.metastore.principal=
-   1. =hcat.metastore.uri=
-
-*Note:* The HCatalog Metastore and Hive Metastore are one and the same and so 
the "hcat" type credential can also be used with the
-Hive action to talk to a secure Hive Metastore.
-
-Hive2Credentials requires these two properties:
-
-   1. hive2.jdbc.url
-   1. hive2.server.principal
-
-To use any of these implementations, they must be set in the 
oozie.credentials.credentialclasses property as described previously
-
-<verbatim>
-   <property>
-      <name>oozie.credentials.credentialclasses</name>
-      <value>
-         hcat=org.apache.oozie.action.hadoop.HCatCredentials,
-         hive=org.apache.oozie.action.hadoop.HbaseCredentials,
-         hive2=org.apache.oozie.action.hadoop.Hive2Credentials
-      </value>
-   </property>
-</verbatim>
-
-[[index][::Go back to Oozie Documentation Index::]]
-
-</noautolink>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/oozie/blob/ca72d443/docs/src/site/twiki/index.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/index.twiki b/docs/src/site/twiki/index.twiki
index 765a6ba..c8ba742 100644
--- a/docs/src/site/twiki/index.twiki
+++ b/docs/src/site/twiki/index.twiki
@@ -49,7 +49,7 @@ Enough reading already? Follow the steps in 
[[DG_QuickStart][Oozie Quick Start]]
    * [[./client/apidocs/index.html][Oozie Client Javadocs]]
    * [[./core/apidocs/index.html][Oozie Core Javadocs]]
    * [[WebServicesAPI][Oozie Web Services API]]
-   * [[DG_UnifiedCredentialsModule][Unified Credentials Module]]
+   * [[DG_ActionAuthentication][Action Authentication]]
 
 ---+++ Action Extensions
 

http://git-wip-us.apache.org/repos/asf/oozie/blob/ca72d443/release-log.txt
----------------------------------------------------------------------
diff --git a/release-log.txt b/release-log.txt
index a4bd20b..5067a5e 100644
--- a/release-log.txt
+++ b/release-log.txt
@@ -1,5 +1,6 @@
 -- Oozie 4.2.0 release (trunk - unreleased)
 
+OOZIE-1853 Improve the Credentials documentation (rkanter)
 OOZIE-1954 Add a way for the MapReduce action to be configured by Java code 
(rkanter)
 OOZIE-2003 Checkstyle issues (rkanter via shwethags)
 OOZIE-1457 Create a Hive Server 2 action (rkanter)

Reply via email to