[jira] [Commented] (HADOOP-15387) Produce a shaded hadoop-cloud-storage JAR for applications to use

Sean Busbey (JIRA) Wed, 30 Jan 2019 08:17:11 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756273#comment-16756273
 ]


Sean Busbey commented on HADOOP-15387:
--------------------------------------

let me think on this a bit. I think the hadoop-common thing is fixable without 
too much heartburn.

the goal is a downstreamer adds {{org.apache.hadoop:hadoop-cloud-storage}} as a 
dependency and things work right? (again ignoring some specifics around "we 
need these logging frameworks" etc)

is "things work" just for "I'm using FileSystem APIs to access the cloud 
storage system X"? or is it some other subset of downstream facing APIs? I know 
your original description said this should pull in the hadoop-client stuff, but 
would it be too much to instead say use of {{hadoop-cloud-storage}} always 
required {{hadoop-client-api}} and {{hadoop-client-runtime}}? Specifically as 
transitive dependencies, not like we'd make folks always add 3 entries to 
maven? (though I suspect most practical uses will require listing one of those 
directly if folks use {{dependency:analyze}})

would the individual SDKs being optional be too onerous? essentially it would 
mean everyone would add {{hadoop-cloud-storage}} and they'd add the SDK(s) for 
whichever of the implementation they were going to actually use. Or is the 
common use case here the opposite? like most downstream users will want to 
opt-out via maven exclusions rather than needing to opt-in? opt-in would mean 
we could make it so only folks who specifically want to work with a provider 
who's SDK leaks dependencies would be impacted by that leakage (and we'd keep 
fixing it the problem of the SDK provider and not us).

> Produce a shaded hadoop-cloud-storage JAR for applications to use
> -----------------------------------------------------------------
>
>                 Key: HADOOP-15387
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15387
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> Produce a maven-shaded hadoop-cloudstorage JAR for dowstream use so that
>  * Hadoop dependency choices don't control their decisions
>  * Little/No risk of their JAR changes breaking Hadoop bits they depend on
> This JAR would pull in the shaded hadoop-client JAR, and the aws-sdk-bundle 
> JAR, neither of which would be unshaded (so yes, upgrading aws-sdks would be 
> a bit risky, but double shading a pre-shaded 30MB JAR is excessive on 
> multiple levels.
> Metrics of success: Spark, Tez, Flink etc can pick up and use, and all are 
> happy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15387) Produce a shaded hadoop-cloud-storage JAR for applications to use

Reply via email to