[
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343673#comment-14343673
]
Steve Loughran commented on HADOOP-11656:
-----------------------------------------
I'm not trying to stop this work, I do agree that it needs fixing, just
wondering how to do this in a way which has (a) tangible immediate benefits in
2015 (b) keeps Hadoop 3.x a low-cost, low-risk update, not a Perl 6 or python 3.
Maybe there are multiple strategies to take here, short term and long term
Short term (2.x)
# Hadoop works across all shipping guava versions, so update it in 2.8 (giving
a warning in 2.7 that this is the last)
# get the OSGI patches in, so that anyone who wants to use Hadoop 2.x code
within an OSGi-enabled JVM, can.
Longer term (3.x)
# split client/server artifacts with a leaner client (which can still use
guava, protobuf, SLF4J &c), just strip out the pure-server side stuff from
HDFS, so at least introduce less there.
# maybe a pure-REST client built on Jersey (and its dependencies), supporting
SPNEGO authed interaction with WebHDFS, YARN, other apps. This will
underperform compared to in-cluster HDFS apps, but should be sufficient for
remote interaction.
# classpath isolation as proposed here (somehow)
> Classpath isolation for downstream clients
> ------------------------------------------
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: Sean Busbey
> Assignee: Sean Busbey
> Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party
> libraries. As our code base grows and matures we increase the set of
> libraries we rely on. At the same time, as our user base grows we increase
> the likelihood that some downstream project will run into a conflict while
> attempting to use a different version of some library we depend on. This has
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to
> off and they don't do anything to help dependency conflicts on the driver
> side or for folks talking to HDFS directly. This should serve as an umbrella
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when
> executing user provided code, whether client side in a launcher/driver or on
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want
> to run substantially ahead or behind the versions we need and the project is
> freer to change our own dependency versions because they'll no longer be in
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases
> written in the comments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)