[ 
https://issues.apache.org/jira/browse/OOZIE-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Juhasz updated OOZIE-3575:
-------------------------------
    Description: 
Oozie by default gathers delegation tokens for the nodes defined in 
_mapreduce.job.hdfs-servers_ (or _oozie.launcher.mapreduce.job.hdfs-servers_ in 
case of distcp actions) and for the workflow path.

Though this implementation is good for hdfs, we dont support occasions where 
the job related resources, which we want to access in runtime are present on 
different file systems/buckets etc...

The HDFSCredentials class should be revised to handle getting tokens for 
different cloud storages.

*The following scenarios should be addressed:*
Oozie should obtain delegation token in case

* the defaultFs is cloud
* the workload.xml is in cloud
* input/output/auxiliary files referred from workflow are in cloud
* (newly introduced feature) user could define filesystem credentials for the 
workflow (as its done with hive/hcat etc..) -> this would allow the user to 
handle the situation where Oozie could not decide which tokens are needed at 
launch time by default and could also get tokens for different cloud storages 
and buckets as well

Example for credentials addition:
{noformat}
<credential name="aws_auth" type="filesystem">
  <property>
    <name>filesystem.path</name>
    <value>s3a://qe-s3-bucket-mst</value>
  </property>
</credential>
{noformat}


  was:
Oozie by default gathers delegation tokens for the nodes defined in 
_mapreduce.job.hdfs-servers_ (or _oozie.launcher.mapreduce.job.hdfs-servers_ in 
case of distcp actions) and for the workflow path.

Though this implementation is good for hdfs, we dont support occasions where 
the job related resources, which we want to access in runtime are present on 
different file systems/buckets etc...

The HDFSCredentials class should be revised to handle getting tokens for 
different cloud storages.

*The following scenarios should be addressed:*
Oozie should obtain delegation token in case

* the defaultFs is cloud
* the workload.xml is in cloud
* input/output/auxiliary files referred from workflow are in cloud
* (newly introduced feature) user could define filesystem credentials for the 
workflow (as its done with hive/hcat etc..) -> this would allow the user to 
handle the situation where Oozie could not decide which tokens are needed at 
launch time by default and could also get tokens for different cloud storages 
and buckets as well

Example for credentials addition:
{noformat}
<credential name="aws_auth" type="filesystem">
  <property>
    <name>filesystem</name>
    <value>s3a://qe-s3-bucket-mst</value>
  </property>
</credential>
{noformat}



> Add credential support for cloud file systems
> ---------------------------------------------
>
>                 Key: OOZIE-3575
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3575
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 5.2.0
>            Reporter: Mate Juhasz
>            Assignee: Mate Juhasz
>            Priority: Major
>             Fix For: 5.3.0
>
>         Attachments: OOZIE-3575-v2.patch, OOZIE-3575-v3.patch, 
> OOZIE-3575.patch
>
>
> Oozie by default gathers delegation tokens for the nodes defined in 
> _mapreduce.job.hdfs-servers_ (or _oozie.launcher.mapreduce.job.hdfs-servers_ 
> in case of distcp actions) and for the workflow path.
> Though this implementation is good for hdfs, we dont support occasions where 
> the job related resources, which we want to access in runtime are present on 
> different file systems/buckets etc...
> The HDFSCredentials class should be revised to handle getting tokens for 
> different cloud storages.
> *The following scenarios should be addressed:*
> Oozie should obtain delegation token in case
> * the defaultFs is cloud
> * the workload.xml is in cloud
> * input/output/auxiliary files referred from workflow are in cloud
> * (newly introduced feature) user could define filesystem credentials for the 
> workflow (as its done with hive/hcat etc..) -> this would allow the user to 
> handle the situation where Oozie could not decide which tokens are needed at 
> launch time by default and could also get tokens for different cloud storages 
> and buckets as well
> Example for credentials addition:
> {noformat}
> <credential name="aws_auth" type="filesystem">
>   <property>
>     <name>filesystem.path</name>
>     <value>s3a://qe-s3-bucket-mst</value>
>   </property>
> </credential>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to