[ 
https://issues.apache.org/jira/browse/HADOOP-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040807#comment-18040807
 ] 

ASF GitHub Bot commented on HADOOP-19696:
-----------------------------------------

steveloughran commented on PR #8094:
URL: https://github.com/apache/hadoop/pull/8094#issuecomment-3580952919

   @pan3793 happy to explain, and happy to hear your concerns.
   
   the build is set to keep out all the new transitive dependencies as much for 
security reasons as size: fewer dependencies, fewer dependabot alerts about 
CVEs, fewer jars to keep updating.
   
   but the hadoop-gcs and hadoop-cos (or is it hadoop-tos? it's 3.5+ only I 
think) both build shaded releases with all their dependencies in their JARs if 
you do a -Pdist build, which of course ASF releases do. This makes for bigger 
artifacts but not so big that they create a distribution problem...and that 
shading makes direct use of the fs through an import of the jar or 
hadoop-cloud-storage pom easier. Changing that packaging to not do the shading 
adds many, many more libraries to hadoop common (all enumerated and listed in 
LICENSE-binary), and would complicate use through maven declarations: classpath 
hell.
   
   Although I don't use Aliyun, tos, cos, volcano connectors myself, I don't 
want to do anything to stop them being used, and with these changes make it 
easy for people to build a hadoop-release with out-the-box support for them.
   
   FWIW, in cloudera we
   * keep hadoop-aws and hadoop-azure in tools/lib
   * and add google gcs (not the one in trunk, the original)
   * plus some associated auth/permissions
   * somehow add them to the classpath for FS commands
   * *and* strip out the hadoop-cos artifact as its sdk caused problems talking 
to newer s3 regions due to an outdated `mozilla/public-suffix-list.txt` 
resource confusing aws sdk about what were toplevel domains. Good bug to track 
down :)
   
   




> hadoop binary distribution to move cloud connectors to hadoop common/lib
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-19696
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19696
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure, fs/gcs, fs/huawei, fs/s3
>    Affects Versions: 3.4.2
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> Place all the cloud connector hadoop-* artifacts and dependencies into 
> hadoop/common/lib so that the stores can be directly accessed.
> * filesystem operations against abfs, s3a, gcs, etc don't need any effort 
> setting things up. 
> * Releases without the aws bundle.jar can be trivially updated by adding any 
> version of the sdk libraries to the common/lib dir. 
> This adds a lot more stuff into the distribution, so I'm doing the following 
> design
> * all hadoop-* modules in common/lib
> * minimal dependencies for hadoop-azure and hadoop-gcs (once we get those 
> right!)
> * hadoop-aws: everything except bundle.jar
> * other connectors: only included with explicit profiles.
> ASF releases will support azure out the box, the others once you add the 
> dependencies. And anyone can build their own release with everything
> One concern here, we make hadoop-cloud-storage artifact incomplete at pulling 
> in things when depended on. We may need a separate module for the distro 
> setup.
> Noticed during this that the hadoop-tos component is shaded and includes 
> stuff (httpclient5) that we need under control. Filed HADOOP-19708 and 
> incorporating here. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to