[
https://issues.apache.org/jira/browse/HADOOP-18154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ju Clarysse updated HADOOP-18154:
---------------------------------
Description:
We are using the latest version of
[delta-sharing|https://github.com/delta-io/delta-sharing] which takes advantage
of
[hadoop-aws|https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html]
(S3A) connector in [Hadoop release version
2.10.1|https://github.com/apache/hadoop/tree/rel/release-2.10.1] to mount an
AWS S3 File System. In our particular setup, all services are operated in
Amazon Elastic Kubernetes Service (EKS) and need to comply to the AWS security
concept [IAM roles for service
accounts|https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html]
(IRSA).
As [Delta sharing S3 connection|https://github.com/delta-io/delta-sharing#s3]
doesn't offer any corresponding support, we patched hadoop-aws-2.10.1 to
address this need via a new credentials provider class
org.apache.hadoop.fs.s3a.OIDCTokenCredentialsProvider. We also upgraded
dependency aws-java-sdk-bundle to its latest version 1.12.167 as [AWS
WebIdentityTokenCredentialsProvider
class|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/WebIdentityTokenCredentialsProvider.html%E2%80%A6]
was not yet available in original version 1.11.271.
We believe that other delta-sharing users could benefit from this short-term
contribution. Then sooner or later, delta-sharing owners will have to upgrade
their project to a more recent version of hadoop-aws that is probably more
widely used. The effort to promote this change is probably low.
Additional note: AWS WebIdentityTokenCredentialsProvider class is directly
supported by Spark applications submitted with configuration properties
`spark.hadoop.fs.s3a.aws.credentials.provider`and
`spark.kubernetes.authenticate.submission.oauthToken`
([doc|https://spark.apache.org/docs/latest/running-on-kubernetes.html#spark-properties]).
So bringing this support to Hadoop will primarily be interesting for non-Spark
users.
was:
We are using the latest version of
[delta-sharing|https://github.com/delta-io/delta-sharing] which takes advantage
of
[hadoop-aws|https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html]
(S3A) connector in [Hadoop release version
2.10.1|https://github.com/apache/hadoop/tree/rel/release-2.10.1] to mount an
AWS S3 File System. In our particular setup, all services are operated in
Amazon Elastic Kubernetes Service (EKS) and need to comply to the AWS security
concept [IAM roles for service
accounts|https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html]
(IRSA).
As [Delta sharing S3 connection|https://github.com/delta-io/delta-sharing#s3]
doesn't offer any corresponding support, we patched hadoop-aws-2.10.1 to
address this need via a new credentials provider class
org.apache.hadoop.fs.s3a.OIDCTokenCredentialsProvider. We also upgraded
dependency aws-java-sdk-bundle to its latest version 1.12.167 as [AWS
WebIdentityTokenCredentialsProvider
class|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/WebIdentityTokenCredentialsProvider.html%E2%80%A6]
was not yet available in original version 1.11.271.
We believe that other delta-sharing users could benefit from this short-term
contribution. Sooner or later, delta-sharing owners will then have to upgrade
to a more recent version of hadoop-aws that is probably more widely used. The
effort to promote this change could be limited while the opportunity to make
other folks happy could be great.
> Extend S3A to WebIdentity
> -------------------------
>
> Key: HADOOP-18154
> URL: https://issues.apache.org/jira/browse/HADOOP-18154
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 2.10.1
> Reporter: Ju Clarysse
> Assignee: Ju Clarysse
> Priority: Major
>
> We are using the latest version of
> [delta-sharing|https://github.com/delta-io/delta-sharing] which takes
> advantage of
> [hadoop-aws|https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html]
> (S3A) connector in [Hadoop release version
> 2.10.1|https://github.com/apache/hadoop/tree/rel/release-2.10.1] to mount an
> AWS S3 File System. In our particular setup, all services are operated in
> Amazon Elastic Kubernetes Service (EKS) and need to comply to the AWS
> security concept [IAM roles for service
> accounts|https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html]
> (IRSA).
> As [Delta sharing S3 connection|https://github.com/delta-io/delta-sharing#s3]
> doesn't offer any corresponding support, we patched hadoop-aws-2.10.1 to
> address this need via a new credentials provider class
> org.apache.hadoop.fs.s3a.OIDCTokenCredentialsProvider. We also upgraded
> dependency aws-java-sdk-bundle to its latest version 1.12.167 as [AWS
> WebIdentityTokenCredentialsProvider
> class|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/WebIdentityTokenCredentialsProvider.html%E2%80%A6]
> was not yet available in original version 1.11.271.
> We believe that other delta-sharing users could benefit from this short-term
> contribution. Then sooner or later, delta-sharing owners will have to upgrade
> their project to a more recent version of hadoop-aws that is probably more
> widely used. The effort to promote this change is probably low.
> Additional note: AWS WebIdentityTokenCredentialsProvider class is directly
> supported by Spark applications submitted with configuration properties
> `spark.hadoop.fs.s3a.aws.credentials.provider`and
> `spark.kubernetes.authenticate.submission.oauthToken`
> ([doc|https://spark.apache.org/docs/latest/running-on-kubernetes.html#spark-properties]).
> So bringing this support to Hadoop will primarily be interesting for
> non-Spark users.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]