[
https://issues.apache.org/jira/browse/HADOOP-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039449#comment-18039449
]
ASF GitHub Bot commented on HADOOP-19696:
-----------------------------------------
steveloughran commented on code in PR #8094:
URL: https://github.com/apache/hadoop/pull/8094#discussion_r2543140022
##########
LICENSE-binary:
##########
@@ -536,3 +549,8 @@ Public Domain
-------------
aopalliance:aopalliance:1.0
+
Review Comment:
cut
##########
hadoop-cloud-storage-project/hadoop-cloud-storage-dist/pom.xml:
##########
@@ -0,0 +1,281 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ https://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
https://maven.apache.org/xsd/maven-4.0.0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+ <parent>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-project</artifactId>
+ <version>3.4.3-SNAPSHOT</version>
+ <relativePath>../../hadoop-project</relativePath>
+ </parent>
+ <artifactId>hadoop-cloud-storage-dist</artifactId>
+ <version>3.4.3-SNAPSHOT</version>
+ <packaging>jar</packaging>
+
+ <description>Apache Hadoop Cloud Storage Distribution</description>
+ <name>Apache Hadoop Cloud Storage Distribution</name>
+
+ <!--
+ This pulls in all the artifacts to copy into common/lib and so put into
+ the Hadoop distro and onto the classpath.
+
+ The assembly file
/hadoop-assemblies/src/main/resources/assemblies/hadoop-cloud-storage.xml
+ is processed to define the layout and to add extra files alongside
+ the Jars.
+
+ By default, while hadoop-* artifacts are all included, dependencies
+ are omitted for all cloud connectors except hadoop-azure and
+ possibly hadoop-gcp and hadoop-tos modules.
+ For hadoop-aws the AWS SDK bundle.jar omitted, but everything else is
included.
+
+ * This keeps binary release size below the limit of apache distributions
+ * Reduces download and size overhead in docker usage.
+ * Reduces the CVE attack surface
+ * Reduces the risk of classpath conflict.
+
+ To produce a build with the specific desired dependencies, the build must be
executed
+ with the relevant profile of ${module}-package.
+
+ For example, a build with the hadoop-aws and hadoop-azure-datalake
dependencies,
+ build with -Dhadoop-aws-package -Dhadoop-azure-datalake-package
+
+ Available package profiles:
+ hadoop-aws-package
Review Comment:
restore hadoop-aliyun-package docs
##########
BUILDING.txt:
##########
@@ -385,6 +385,49 @@ Create a local staging version of the website (in
/tmp/hadoop-site)
Note that the site needs to be built in a second pass after other artifacts.
+----------------------------------------------------------------------------------
+Including Cloud Connector Dependencies in Distributions:
+
+Hadoop distributions include the hadoop modules needed to work with data and
services
+on cloud infrastructure
+
+However, dependencies are omitted for all cloud connectors except hadoop-azure
+(abfs:// and wasb://) and possibly hadoop-gcp (gs://) and hadoop-tos (tos://).
+For the latter two modules, it depends on shading options.
+
+For hadoop-aws the AWS SDK bundle.jar is omitted, but everything else is
included.
+
+Excluding the extra binaries:
+* Keeps release artifact size below the limit of the ASF distribution network.
+* Reduces download and size overhead in docker usage.
+* Reduces the CVE attack surface and audit-related complaints about those same
CVEs.
+* Reduces the risk of classpath conflict.
+
+To produce a build with the specific desired dependencies, the build must be
executed
+with the relevant profile of ${module}-package alongside the -Pdist profile.
+
+For example, a build with the hadoop-aws and hadoop-azure-datalake
dependencies,
+run with
+
+ mvn package -Pdist -DskipTests -Dhadoop-aws-package
-Dhadoop-azure-datalake-package
+
+Available package profiles:
+ hadoop-aws-package
Review Comment:
restore hadoop-aliyun-package
> hadoop binary distribution to move cloud connectors to hadoop common/lib
> ------------------------------------------------------------------------
>
> Key: HADOOP-19696
> URL: https://issues.apache.org/jira/browse/HADOOP-19696
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure, fs/gcs, fs/huawei, fs/s3
> Affects Versions: 3.4.2
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> Place all the cloud connector hadoop-* artifacts and dependencies into
> hadoop/common/lib so that the stores can be directly accessed.
> * filesystem operations against abfs, s3a, gcs, etc don't need any effort
> setting things up.
> * Releases without the aws bundle.jar can be trivially updated by adding any
> version of the sdk libraries to the common/lib dir.
> This adds a lot more stuff into the distribution, so I'm doing the following
> design
> * all hadoop-* modules in common/lib
> * minimal dependencies for hadoop-azure and hadoop-gcs (once we get those
> right!)
> * hadoop-aws: everything except bundle.jar
> * other connectors: only included with explicit profiles.
> ASF releases will support azure out the box, the others once you add the
> dependencies. And anyone can build their own release with everything
> One concern here, we make hadoop-cloud-storage artifact incomplete at pulling
> in things when depended on. We may need a separate module for the distro
> setup.
> Noticed during this that the hadoop-tos component is shaded and includes
> stuff (httpclient5) that we need under control. Filed HADOOP-19708 and
> incorporating here.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]