[
https://issues.apache.org/jira/browse/HADOOP-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756273#comment-16756273
]
Sean Busbey commented on HADOOP-15387:
--------------------------------------
let me think on this a bit. I think the hadoop-common thing is fixable without
too much heartburn.
the goal is a downstreamer adds {{org.apache.hadoop:hadoop-cloud-storage}} as a
dependency and things work right? (again ignoring some specifics around "we
need these logging frameworks" etc)
is "things work" just for "I'm using FileSystem APIs to access the cloud
storage system X"? or is it some other subset of downstream facing APIs? I know
your original description said this should pull in the hadoop-client stuff, but
would it be too much to instead say use of {{hadoop-cloud-storage}} always
required {{hadoop-client-api}} and {{hadoop-client-runtime}}? Specifically as
transitive dependencies, not like we'd make folks always add 3 entries to
maven? (though I suspect most practical uses will require listing one of those
directly if folks use {{dependency:analyze}})
would the individual SDKs being optional be too onerous? essentially it would
mean everyone would add {{hadoop-cloud-storage}} and they'd add the SDK(s) for
whichever of the implementation they were going to actually use. Or is the
common use case here the opposite? like most downstream users will want to
opt-out via maven exclusions rather than needing to opt-in? opt-in would mean
we could make it so only folks who specifically want to work with a provider
who's SDK leaks dependencies would be impacted by that leakage (and we'd keep
fixing it the problem of the SDK provider and not us).
> Produce a shaded hadoop-cloud-storage JAR for applications to use
> -----------------------------------------------------------------
>
> Key: HADOOP-15387
> URL: https://issues.apache.org/jira/browse/HADOOP-15387
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Priority: Major
>
> Produce a maven-shaded hadoop-cloudstorage JAR for dowstream use so that
> * Hadoop dependency choices don't control their decisions
> * Little/No risk of their JAR changes breaking Hadoop bits they depend on
> This JAR would pull in the shaded hadoop-client JAR, and the aws-sdk-bundle
> JAR, neither of which would be unshaded (so yes, upgrading aws-sdks would be
> a bit risky, but double shading a pre-shaded 30MB JAR is excessive on
> multiple levels.
> Metrics of success: Spark, Tez, Flink etc can pick up and use, and all are
> happy
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]