[ 
https://issues.apache.org/jira/browse/HADOOP-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039449#comment-18039449
 ] 

ASF GitHub Bot commented on HADOOP-19696:
-----------------------------------------

steveloughran commented on code in PR #8094:
URL: https://github.com/apache/hadoop/pull/8094#discussion_r2543140022


##########
LICENSE-binary:
##########
@@ -536,3 +549,8 @@ Public Domain
 -------------
 
 aopalliance:aopalliance:1.0
+

Review Comment:
   cut



##########
hadoop-cloud-storage-project/hadoop-cloud-storage-dist/pom.xml:
##########
@@ -0,0 +1,281 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+   https://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
https://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  <modelVersion>4.0.0</modelVersion>
+  <parent>
+    <groupId>org.apache.hadoop</groupId>
+    <artifactId>hadoop-project</artifactId>
+    <version>3.4.3-SNAPSHOT</version>
+    <relativePath>../../hadoop-project</relativePath>
+  </parent>
+  <artifactId>hadoop-cloud-storage-dist</artifactId>
+  <version>3.4.3-SNAPSHOT</version>
+  <packaging>jar</packaging>
+
+  <description>Apache Hadoop Cloud Storage Distribution</description>
+  <name>Apache Hadoop Cloud Storage Distribution</name>
+
+  <!--
+  This pulls in all the artifacts to copy into common/lib and so put into
+  the Hadoop distro and onto the classpath.
+
+  The assembly file 
/hadoop-assemblies/src/main/resources/assemblies/hadoop-cloud-storage.xml
+  is processed to define the layout and to add extra files alongside
+  the Jars.
+
+  By default, while hadoop-* artifacts are all included, dependencies
+  are omitted for all cloud connectors except hadoop-azure and
+  possibly hadoop-gcp and hadoop-tos modules.
+  For hadoop-aws the AWS SDK bundle.jar omitted, but everything else is 
included.
+
+   * This keeps binary release size below the limit of apache distributions
+   * Reduces download and size overhead in docker usage.
+   * Reduces the CVE attack surface
+   * Reduces the risk of classpath conflict.
+
+  To produce a build with the specific desired dependencies, the build must be 
executed
+  with the relevant profile of ${module}-package.
+
+  For example, a build with the hadoop-aws and hadoop-azure-datalake 
dependencies,
+  build with -Dhadoop-aws-package -Dhadoop-azure-datalake-package
+
+  Available package profiles:
+    hadoop-aws-package

Review Comment:
   restore hadoop-aliyun-package docs



##########
BUILDING.txt:
##########
@@ -385,6 +385,49 @@ Create a local staging version of the website (in 
/tmp/hadoop-site)
 
 Note that the site needs to be built in a second pass after other artifacts.
 
+----------------------------------------------------------------------------------
+Including Cloud Connector Dependencies in Distributions:
+
+Hadoop distributions include the hadoop modules needed to work with data and 
services
+on cloud infrastructure
+
+However, dependencies are omitted for all cloud connectors except hadoop-azure
+(abfs:// and wasb://) and possibly hadoop-gcp (gs://) and hadoop-tos (tos://).
+For the latter two modules, it depends on shading options.
+
+For hadoop-aws the AWS SDK bundle.jar is omitted, but everything else is 
included.
+
+Excluding the extra binaries:
+* Keeps release artifact size below the limit of the ASF distribution network.
+* Reduces download and size overhead in docker usage.
+* Reduces the CVE attack surface and audit-related complaints about those same 
CVEs.
+* Reduces the risk of classpath conflict.
+
+To produce a build with the specific desired dependencies, the build must be 
executed
+with the relevant profile of ${module}-package alongside the -Pdist profile.
+
+For example, a build with the hadoop-aws and hadoop-azure-datalake 
dependencies,
+run with
+
+ mvn package -Pdist -DskipTests -Dhadoop-aws-package 
-Dhadoop-azure-datalake-package
+
+Available package profiles:
+  hadoop-aws-package

Review Comment:
   restore hadoop-aliyun-package 





> hadoop binary distribution to move cloud connectors to hadoop common/lib
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-19696
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19696
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure, fs/gcs, fs/huawei, fs/s3
>    Affects Versions: 3.4.2
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> Place all the cloud connector hadoop-* artifacts and dependencies into 
> hadoop/common/lib so that the stores can be directly accessed.
> * filesystem operations against abfs, s3a, gcs, etc don't need any effort 
> setting things up. 
> * Releases without the aws bundle.jar can be trivially updated by adding any 
> version of the sdk libraries to the common/lib dir. 
> This adds a lot more stuff into the distribution, so I'm doing the following 
> design
> * all hadoop-* modules in common/lib
> * minimal dependencies for hadoop-azure and hadoop-gcs (once we get those 
> right!)
> * hadoop-aws: everything except bundle.jar
> * other connectors: only included with explicit profiles.
> ASF releases will support azure out the box, the others once you add the 
> dependencies. And anyone can build their own release with everything
> One concern here, we make hadoop-cloud-storage artifact incomplete at pulling 
> in things when depended on. We may need a separate module for the distro 
> setup.
> Noticed during this that the hadoop-tos component is shaded and includes 
> stuff (httpclient5) that we need under control. Filed HADOOP-19708 and 
> incorporating here. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to