[ https://issues.apache.org/jira/browse/HADOOP-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756273#comment-16756273 ]
Sean Busbey commented on HADOOP-15387: -------------------------------------- let me think on this a bit. I think the hadoop-common thing is fixable without too much heartburn. the goal is a downstreamer adds {{org.apache.hadoop:hadoop-cloud-storage}} as a dependency and things work right? (again ignoring some specifics around "we need these logging frameworks" etc) is "things work" just for "I'm using FileSystem APIs to access the cloud storage system X"? or is it some other subset of downstream facing APIs? I know your original description said this should pull in the hadoop-client stuff, but would it be too much to instead say use of {{hadoop-cloud-storage}} always required {{hadoop-client-api}} and {{hadoop-client-runtime}}? Specifically as transitive dependencies, not like we'd make folks always add 3 entries to maven? (though I suspect most practical uses will require listing one of those directly if folks use {{dependency:analyze}}) would the individual SDKs being optional be too onerous? essentially it would mean everyone would add {{hadoop-cloud-storage}} and they'd add the SDK(s) for whichever of the implementation they were going to actually use. Or is the common use case here the opposite? like most downstream users will want to opt-out via maven exclusions rather than needing to opt-in? opt-in would mean we could make it so only folks who specifically want to work with a provider who's SDK leaks dependencies would be impacted by that leakage (and we'd keep fixing it the problem of the SDK provider and not us). > Produce a shaded hadoop-cloud-storage JAR for applications to use > ----------------------------------------------------------------- > > Key: HADOOP-15387 > URL: https://issues.apache.org/jira/browse/HADOOP-15387 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/adl, fs/azure, fs/oss, fs/s3, fs/swift > Affects Versions: 3.1.0 > Reporter: Steve Loughran > Priority: Major > > Produce a maven-shaded hadoop-cloudstorage JAR for dowstream use so that > * Hadoop dependency choices don't control their decisions > * Little/No risk of their JAR changes breaking Hadoop bits they depend on > This JAR would pull in the shaded hadoop-client JAR, and the aws-sdk-bundle > JAR, neither of which would be unshaded (so yes, upgrading aws-sdks would be > a bit risky, but double shading a pre-shaded 30MB JAR is excessive on > multiple levels. > Metrics of success: Spark, Tez, Flink etc can pick up and use, and all are > happy -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org