[
https://issues.apache.org/jira/browse/HADOOP-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029549#comment-18029549
]
Steve Loughran commented on HADOOP-19696:
-----------------------------------------
and full jars if you pull in everything. aws bundle.jar dominates; the rest
adds up.
{code}
104K aliyun-java-core-0.2.11-beta.jar
190K aliyun-java-sdk-core-4.5.10.jar
160K aliyun-java-sdk-kms-2.11.0.jar
216K aliyun-java-sdk-ram-3.1.0.jar
907K aliyun-sdk-oss-3.18.1.jar
2.4M analyticsaccelerator-s3-1.3.0.jar
3.0K animal-sniffer-annotations-1.24.jar
3.0K annotations-4.1.1.4.jar
49K api-common-2.47.2.jar
7.3K auto-value-annotations-1.11.0.jar
111K azure-data-lake-store-sdk-2.3.9.jar
10K azure-keyvault-core-1.0.0.jar
796K azure-storage-7.0.1.jar
612M bundle-2.29.52.jar
232K checker-qual-3.49.0.jar
346K commons-codec-1.15.jar
69K commons-logging-1.3.0.jar
4.3M conscrypt-openjdk-uber-2.5.2.jar
8.3M cos_api-bundle-5.6.19.jar
18K detector-resources-support-0.33.0.jar
317K dom4j-2.1.4.jar
19K error_prone_annotations-2.36.0.jar
39K exporter-metrics-0.33.0.jar
4.6K failureaccess-1.0.2.jar
52K gapic-google-cloud-storage-v2-2.52.0.jar
424K gax-2.64.2.jar
154K gax-grpc-2.64.2.jar
162K gax-httpjson-2.64.2.jar
295K google-api-client-2.7.2.jar
252K google-api-services-storage-v1-rev20250420-2.0.0.jar
8.2K google-auth-library-credentials-1.33.1.jar
294K google-auth-library-oauth2-http-1.33.1.jar
137K google-cloud-core-2.54.2.jar
16K google-cloud-core-grpc-2.54.2.jar
15K google-cloud-core-http-2.54.2.jar
249K google-cloud-monitoring-3.52.0.jar
1.3M google-cloud-storage-2.52.0.jar
289K google-http-client-1.46.3.jar
11K google-http-client-apache-v2-1.46.3.jar
19K google-http-client-appengine-1.46.3.jar
13K google-http-client-gson-1.46.3.jar
9.4K google-http-client-jackson2-1.46.3.jar
80K google-oauth-client-1.37.0.jar
316K grpc-alts-1.70.0.jar
316K grpc-api-1.70.0.jar
14K grpc-auth-1.70.0.jar
293B grpc-context-1.70.0.jar
639K grpc-core-1.70.0.jar
30K grpc-google-cloud-storage-v2-2.52.0.jar
15K grpc-googleapis-1.70.0.jar
175K grpc-grpclb-1.70.0.jar
39K grpc-inprocess-1.70.0.jar
9.3M grpc-netty-shaded-1.70.0.jar
67K grpc-opentelemetry-1.70.0.jar
5.2K grpc-protobuf-1.70.0.jar
7.7K grpc-protobuf-lite-1.70.0.jar
248K grpc-rls-1.70.0.jar
928K grpc-services-1.70.0.jar
59K grpc-stub-1.70.0.jar
98K grpc-util-1.70.0.jar
9.4M grpc-xds-1.70.0.jar
243K gson-2.9.0.jar
2.9M guava-33.4.8-jre.jar
92K hadoop-aliyun-3.5.0-SNAPSHOT.jar
910K hadoop-aws-3.5.0-SNAPSHOT.jar
810K hadoop-azure-3.5.0-SNAPSHOT.jar
33K hadoop-azure-datalake-3.5.0-SNAPSHOT.jar
68K hadoop-cos-3.5.0-SNAPSHOT.jar
135K hadoop-gcp-3.5.0-SNAPSHOT.jar
142K hadoop-huaweicloud-3.5.0-SNAPSHOT.jar
250K hadoop-tos-3.5.0-SNAPSHOT.jar
762K httpclient-4.5.13.jar
933K httpclient5-5.5.jar
321K httpcore-4.4.13.jar
888K httpcore5-5.3.6.jar
236K httpcore5-h2-5.3.4.jar
100K ini4j-0.5.4.jar
12K j2objc-annotations-3.0.0.jar
462K jackson-core-2.14.3.jar
7.6K java-trace-api-0.2.11-beta.jar
26K javax.annotation-api-1.3.2.jar
320K jdom2-2.0.6.1.jar
88K jettison-1.5.4.jar
575K jetty-util-9.4.57.v20241219.jar
65K jetty-util-ajax-9.4.57.v20241219.jar
3.7K jspecify-1.0.0.jar
19K jsr305-3.0.2.jar
2.1K listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
347K opencensus-api-0.31.1.jar
23K opencensus-contrib-http-util-0.31.1.jar
155K opentelemetry-api-1.47.0.jar
48K opentelemetry-context-1.47.0.jar
8.1K opentelemetry-gcp-resources-1.37.0-alpha.jar
6.6K opentelemetry-sdk-1.47.0.jar
54K opentelemetry-sdk-common-1.47.0.jar
20K opentelemetry-sdk-extension-autoconfigure-spi-1.47.0.jar
53K opentelemetry-sdk-logs-1.47.0.jar
322K opentelemetry-sdk-metrics-1.47.0.jar
129K opentelemetry-sdk-trace-1.47.0.jar
73K opentelemetry-semconv-1.29.0-alpha.jar
275K org.jacoco.agent-0.8.5-runtime.jar
6.8K perfmark-api-0.27.0.jar
1.9M proto-google-cloud-monitoring-v3-3.52.0.jar
980K proto-google-cloud-storage-v2-2.52.0.jar
2.6M proto-google-common-protos-2.55.2.jar
182K proto-google-iam-v1-1.50.2.jar
521K protobuf-java-2.5.0.jar
71K protobuf-java-util-3.25.5.jar
125K re2j-1.1.jar
11K reactive-streams-1.0.3.jar
91K shared-resourcemapping-0.33.0.jar
40K slf4j-api-1.7.36.jar
503K threetenbp-1.7.0.jar
980K ve-tos-java-sdk-hadoop-2.8.9.jar
433K wildfly-openssl-2.1.4.Final.jar
{code}
The default settings will produce something a lot leaner
{code}
2.4M analyticsaccelerator-s3-1.3.0.jar
10K azure-keyvault-core-1.0.0.jar
796K azure-storage-7.0.1.jar
346K commons-codec-1.15.jar
69K commons-logging-1.3.0.jar
910K hadoop-aws-3.5.0-SNAPSHOT.jar
810K hadoop-azure-3.5.0-SNAPSHOT.jar
33K hadoop-azure-datalake-3.5.0-SNAPSHOT.jar
68K hadoop-cos-3.5.0-SNAPSHOT.jar
135K hadoop-gcp-3.5.0-SNAPSHOT.jar
142K hadoop-huaweicloud-3.5.0-SNAPSHOT.jar
250K hadoop-tos-3.5.0-SNAPSHOT.jar
762K httpclient-4.5.13.jar
321K httpcore-4.4.13.jar
575K jetty-util-9.4.57.v20241219.jar
65K jetty-util-ajax-9.4.57.v20241219.jar
433K wildfly-openssl-2.1.4.Final.jar
{code}
> hadoop binary distribution to move cloud connectors to hadoop common/lib
> ------------------------------------------------------------------------
>
> Key: HADOOP-19696
> URL: https://issues.apache.org/jira/browse/HADOOP-19696
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure, fs/gcs, fs/huawei, fs/s3
> Affects Versions: 3.4.2
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> Place all the cloud connector hadoop-* artifacts and dependencies into
> hadoop/common/lib so that the stores can be directly accessed.
> * filesystem operations against abfs, s3a, gcs, etc don't need any effort
> setting things up.
> * Releases without the aws bundle.jar can be trivially updated by adding any
> version of the sdk libraries to the common/lib dir.
> This adds a lot more stuff into the distribution, so I'm doing the following
> design
> * all hadoop-* modules in common/lib
> * minimal dependencies for hadoop-azure and hadoop-gcs (once we get those
> right!)
> * hadoop-aws: everything except bundle.jar
> * other connectors: only included with explicit profiles.
> ASF releases will support azure out the box, the others once you add the
> dependencies. And anyone can build their own release with everything
> One concern here, we make hadoop-cloud-storage artifact incomplete at pulling
> in things when depended on. We may need a separate module for the distro
> setup.
> Noticed during this that the hadoop-tos component is shaded and includes
> stuff (httpclient5) that we need under control. Filed HADOOP-19708 and
> incorporating here.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]