[
https://issues.apache.org/jira/browse/OOZIE-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016038#comment-17016038
]
Mate Juhasz commented on OOZIE-3575:
------------------------------------
Prerequisite oozie-site config changes:
{code:java}
'oozie.service.HadoopAccessorService.supported.filesystems': '*'
'oozie.service.HadoopAccessorService.nameNode.whitelist': ''
{code}
Added s3a filesystem credentials to a simple distcp workflow:
{code:xml}
<credentials>
<credential name='s3a_cred' type='filesystem'>
<property>
<name>filesystem.path</name>
<value>${my_s3a_bucket}</value>
</property>
</credential>
</credentials>
<action name="distcp-node" cred='s3a_cred'>
<distcp xmlns="uri:oozie:distcp-action:1.0">
<resource-manager>${resourceManager}</resource-manager>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputDir}"/>
</prepare>
<arg>-update</arg>
<arg>-skipcrccheck</arg>
<arg>${basePath}/${inputDir}/data.txt</arg>
<arg>${basePath}/${outputDir}/data.txt</arg>
</distcp>
<ok to="end"/>
<error to="fail"/>
</action>
{code}
----
The following scenarios were tested:
* workflow.xml: s3a, basePath: s3a
* workflow.xml: s3a, basePath: HDFS
* workflow.xml: HDFS, basePath: s3a - extra filesystem credentials only needed
in this case
> Add credential support for cloud file systems
> ---------------------------------------------
>
> Key: OOZIE-3575
> URL: https://issues.apache.org/jira/browse/OOZIE-3575
> Project: Oozie
> Issue Type: Sub-task
> Components: core
> Affects Versions: 5.2.0
> Reporter: Mate Juhasz
> Assignee: Mate Juhasz
> Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3575-v2.patch, OOZIE-3575-v3.patch,
> OOZIE-3575.patch
>
>
> Oozie by default gathers delegation tokens for the nodes defined in
> _mapreduce.job.hdfs-servers_ (or _oozie.launcher.mapreduce.job.hdfs-servers_
> in case of distcp actions) and for the workflow path.
> Though this implementation is good for hdfs, we dont support occasions where
> the job related resources, which we want to access in runtime are present on
> different file systems/buckets etc...
> The HDFSCredentials class should be revised to handle getting tokens for
> different cloud storages.
> *The following scenarios should be addressed:*
> Oozie should obtain delegation token in case
> * the defaultFs is cloud
> * the workload.xml is in cloud
> * input/output/auxiliary files referred from workflow are in cloud
> * (newly introduced feature) user could define filesystem credentials for the
> workflow (as its done with hive/hcat etc..) -> this would allow the user to
> handle the situation where Oozie could not decide which tokens are needed at
> launch time by default and could also get tokens for different cloud storages
> and buckets as well
> Example for credentials addition:
> {noformat}
> <credential name="aws_auth" type="filesystem">
> <property>
> <name>filesystem</name>
> <value>s3a://qe-s3-bucket-mst</value>
> </property>
> </credential>
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)