[ https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562117#comment-17562117 ]
Daniel Carl Jones edited comment on SPARK-38958 at 7/4/22 10:06 AM: -------------------------------------------------------------------- There is a workaround which may help if needed in the short-term though, originally shared by Asier in HADOOP-14661. S3A, the underlying connector provided by the Hadoop project for using S3 as a filesystem, has its own factory class to configuring the S3 client. You could extend this today and set a static header by adding a class like below. It needs to be compiled and added to your classpath when using Spark/Hadoop. {code:java} public class CustomS3ClientFactory extends DefaultS3ClientFactory { @Override public AmazonS3 createS3Client(final URI uri, final S3ClientCreationParameters parameters) throws IOException { parameters.withHeader("my-header-key", "my-header-value"); return super.createS3Client(uri, parameters); } } {code} In your Spark application, you should then be able to update your configuration to point to this new factory: {code:java} spark.sparkContext.hadoopConfiguration.set("fs.s3a.s3.client.factory.impl", "your.package.CustomS3ClientFactory") {code} I'm not certain if this is a permanent configuration option made available to users, but it has been around for six years at this point. was (Author: JIRAUSER284792): There is a workaround which may help if needed in the short-term though, originally shared by Asier in HADOOP-14661. The S3 client used by S3A, the underlying connector provided by the Hadoop project for using S3 as a filesystem, has its own factory class to configuring the client. You could extend this today and set a static header by adding a class like below. It needs to be compiled and added to your classpath when using Spark/Hadoop. {code:java} public class CustomS3ClientFactory extends DefaultS3ClientFactory { @Override public AmazonS3 createS3Client(final URI uri, final S3ClientCreationParameters parameters) throws IOException { parameters.withHeader("my-header-key", "my-header-value"); return super.createS3Client(uri, parameters); } } {code} In your Spark application, you should then be able to update your configuration to point to this new factory: {code:java} spark.sparkContext.hadoopConfiguration.set("fs.s3a.s3.client.factory.impl", "your.package.CustomS3ClientFactory") {code} I'm not certain if this is a permanent configuration option made available to users, but it has been around for six years at this point. > Override S3 Client in Spark Write/Read calls > -------------------------------------------- > > Key: SPARK-38958 > URL: https://issues.apache.org/jira/browse/SPARK-38958 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 3.2.1 > Reporter: Hershal > Priority: Major > > Hello, > I have been working to use spark to read and write data to S3. Unfortunately, > there are a few S3 headers that I need to add to my spark read/write calls. > After much looking, I have not found a way to replace the S3 client that > spark uses to make the read/write calls. I also have not found a > configuration that allows me to pass in S3 headers. Here is an example of > some common S3 request headers > ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).] > Does there already exist functionality to add S3 headers to spark read/write > calls or pass in a custom client that would pass these headers on every > read/write request? Appreciate the help and feedback > > Thanks, -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org