[ 
https://issues.apache.org/jira/browse/OOZIE-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016038#comment-17016038
 ] 

Mate Juhasz commented on OOZIE-3575:
------------------------------------

Prerequisite oozie-site config changes:
{code:java}
'oozie.service.HadoopAccessorService.supported.filesystems': '*'
'oozie.service.HadoopAccessorService.nameNode.whitelist': ''
{code}

Added s3a filesystem credentials to a simple distcp workflow:
{code:xml}
    <credentials>
          <credential name='s3a_cred' type='filesystem'>
                  <property>
                          <name>filesystem.path</name>
                          <value>${my_s3a_bucket}</value>
                  </property>
          </credential>
    </credentials>
    <action name="distcp-node" cred='s3a_cred'>
        <distcp xmlns="uri:oozie:distcp-action:1.0">
            <resource-manager>${resourceManager}</resource-manager>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${outputDir}"/>
            </prepare>
            <arg>-update</arg>
            <arg>-skipcrccheck</arg>
            <arg>${basePath}/${inputDir}/data.txt</arg>
            <arg>${basePath}/${outputDir}/data.txt</arg>
            </distcp>
        <ok to="end"/>
        <error to="fail"/>
    </action>
{code}

----
The following scenarios were tested:
* workflow.xml: s3a, basePath: s3a
* workflow.xml: s3a, basePath: HDFS
* workflow.xml: HDFS, basePath: s3a - extra filesystem credentials only needed 
in this case

> Add credential support for cloud file systems
> ---------------------------------------------
>
>                 Key: OOZIE-3575
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3575
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 5.2.0
>            Reporter: Mate Juhasz
>            Assignee: Mate Juhasz
>            Priority: Major
>             Fix For: trunk
>
>         Attachments: OOZIE-3575-v2.patch, OOZIE-3575-v3.patch, 
> OOZIE-3575.patch
>
>
> Oozie by default gathers delegation tokens for the nodes defined in 
> _mapreduce.job.hdfs-servers_ (or _oozie.launcher.mapreduce.job.hdfs-servers_ 
> in case of distcp actions) and for the workflow path.
> Though this implementation is good for hdfs, we dont support occasions where 
> the job related resources, which we want to access in runtime are present on 
> different file systems/buckets etc...
> The HDFSCredentials class should be revised to handle getting tokens for 
> different cloud storages.
> *The following scenarios should be addressed:*
> Oozie should obtain delegation token in case
> * the defaultFs is cloud
> * the workload.xml is in cloud
> * input/output/auxiliary files referred from workflow are in cloud
> * (newly introduced feature) user could define filesystem credentials for the 
> workflow (as its done with hive/hcat etc..) -> this would allow the user to 
> handle the situation where Oozie could not decide which tokens are needed at 
> launch time by default and could also get tokens for different cloud storages 
> and buckets as well
> Example for credentials addition:
> {noformat}
> <credential name="aws_auth" type="filesystem">
>   <property>
>     <name>filesystem</name>
>     <value>s3a://qe-s3-bucket-mst</value>
>   </property>
> </credential>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to