[
https://issues.apache.org/jira/browse/HADOOP-17197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175411#comment-17175411
]
Steve Loughran commented on HADOOP-17197:
-----------------------------------------
no. really no. really, really no. really, really, really no
A key rationale is say, Spark, which doesn't just use the S3A bits, it has a
spark-kinesis module, and there's a spark streaming connector which uses
Spark's SQS queue to send notifications to spark when monitored files
Since we moved to a single shared shaded JAR, we have eliminated all problems
related to AWS SDK and transient dependences conflicting with hadoop
requirements. And because we have a complete jar, we do not have to worry about
classpath/versioning issues with those external-downstream components.
I seem to be the person where all S3A class path issues end up. I am still
dealing with older versions of the hadoop and joda time and java 8
consistencies. There is no way I want to reinstate that problem.
I understand your concerns with the size of the docker image. However I'm
afraid you have to recognise and accept that this is the price of having a
complete and functional high-performance connector with AWS S3 and other
services. You are free to implement your own -I will point you at the Presto
one who's wonderful minimalism appeals to me.
But for the S3A connector: it ships with the AWS SDK shaded and complete.
Of course, you can also think about doing something purely for the Impala
docker images. Let me know how that gets on
> Decrease size of s3a dependencies
> ---------------------------------
>
> Key: HADOOP-17197
> URL: https://issues.apache.org/jira/browse/HADOOP-17197
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Sahil Takiar
> Priority: Major
>
> S3A currently has a dependency on the aws-java-sdk-bundle, which includes the
> SDKs for all AWS services. The jar file for the current version is about 120
> MB, but continues to grow (the latest is about 170 MB). Organic growth is
> expected as more and more AWS services are created.
> The aws-java-sdk-bundle jar file is shaded as well, so it includes all
> transitive dependencies.
> It would be nice if S3A could depend on smaller jar files in order to
> decrease the size of jar files pulled in transitively by clients. Decreasing
> the size of dependencies is particularly important for Docker files, where
> image pull times can be affected by image size.
> One solution here would be for S3A to publish its own shaded jar which
> includes the SDKs for all needed AWS Services (e.g. S3, DynamoDB, etc.) along
> with the transitive dependencies for the individual SDKs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]