[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

Daniel Carl Jones (Jira) Mon, 04 Jul 2022 03:01:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562117#comment-17562117
 ]


Daniel Carl Jones commented on SPARK-38958:
-------------------------------------------

There is a workaround which may help if needed in the short-term though, 
originally shared by Asier in HADOOP-14661. The S3 client used by S3A, the 
underlying connector provided by the Hadoop project for using S3 as a 
filesystem, has its own factory class to configuring the client. You could 
extend this today and set a static header by adding a class like below. It 
needs to be compiled and added to your classpath when using Spark/Hadoop.
{code:java}
public class CustomS3ClientFactory extends DefaultS3ClientFactory {
    @Override
    public AmazonS3 createS3Client(final URI uri,
        final S3ClientCreationParameters parameters) throws IOException {
        parameters.withHeader("my-header-key", "my-header-value");
        return super.createS3Client(uri, parameters);
    }
}
{code}
In your Spark application, you should then be able to update your configuration 
to point to this new factory:
{code:java}
spark.sparkContext.hadoopConfiguration.set("fs.s3a.s3.client.factory.impl", 
"your.package.CustomS3ClientFactory") 
{code}

> Override S3 Client in Spark Write/Read calls
> --------------------------------------------
>
>                 Key: SPARK-38958
>                 URL: https://issues.apache.org/jira/browse/SPARK-38958
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Hershal
>            Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

Reply via email to