[
https://issues.apache.org/jira/browse/HADOOP-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545665#comment-17545665
]
Brandon commented on HADOOP-17372:
----------------------------------
Hello [[email protected]], thank you for your work on S3A.
In our Spark jobs, we use a custom AWS credentials provider class which is
bundled into the Spark application jar. This worked on Hadoop 3.2.1, but
unfortunately this class can't be found after upgrading to Hadoop 3.3.3. This
surfaces as a ClassNotFoundException in S3AFileSystem's initialization:
{noformat}
java.io.IOException: From option fs.s3a.aws.credentials.provider
java.lang.ClassNotFoundException: Class [custom AWS credentials provider class]
not found
at org.apache.hadoop.fs.s3a.S3AUtils.loadAWSProviderClasses (S3AUtils.java:657)
org.apache.hadoop.fs.s3a.S3AUtils.buildAWSProviderList (S3AUtils.java:680)
org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet
(S3AUtils.java:631)
org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient (S3AFileSystem.java:877)
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize (S3AFileSystem.java:534)
org.apache.hadoop.fs.FileSystem.createFileSystem
(FileSystem.java:3469){noformat}
We were able to track this down to the change in this ticket. I believe what's
happening here is:
* The S3AFileSystem class is provided by a jar on disk. This jar is added to
the java classpath via the normal java command-line option. So, the classloader
of S3AFileSystem is a java application classloader.
* The Spark application jar which contains our AWS credentials provider class
is downloaded at runtime by Spark and then "patched into" the java classpath
via Spark's mutable classloader.
* Therefore, classes in the application jar are not visible to the classloader
that loaded S3AFileSystem.
In the meantime, I think our most reasonable path forward is to pull the custom
AWS credentials provider out of the application jar, install it in a jar on
disk, and add it to the java command-line classpath like hadoop-aws itself. Not
too bad, but certainly more complicated than the prior setup with Hadoop 3.2.1.
> S3A AWS Credential provider loading gets confused with isolated classloaders
> ----------------------------------------------------------------------------
>
> Key: HADOOP-17372
> URL: https://issues.apache.org/jira/browse/HADOOP-17372
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Fix For: 3.3.1
>
>
> Problem: exception in loading S3A credentials for an FS, "Class class
> com.amazonaws.auth.EnvironmentVariableCredentialsProvider does not implement
> AWSCredentialsProvider"
> Location: S3A + Spark dataframes test
> Hypothesised cause:
> Configuration.getClasses() uses the context classloader, and with the spark
> isolated CL that's different from the one the s3a FS uses, so it can't load
> AWS credential providers.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]