[
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973613#comment-15973613
]
Steve Loughran commented on HADOOP-11656:
-----------------------------------------
[~ctubbsii]: this is about client side classpath dependencies, not server. If
you want to know why, look at HADOOP-10101 to see coverage of just one JAR,
then consider also Jackson 1.x, jackson 2.x, jersey, and other widely used
things. The ones which cause the most problems are those for IPC: protobuf,
avro, where the generated code has to be in perfect sync with with the version
of classes generated by the protoc compiler and compiled into the archives.
bq. Has the upstream Hadoop community considered other possible options,
yes
bq. such as better semantic versioning
requires fundamental change across the entire java stack, doesn't handle the
problems of a downstream app wanting to use a version of protobuf incompatible
with the version Hadoop's generated classes depend on, etc. Also the whole
notion of "semantically compatible" is one we could discuss for a long time.
Suffice to say, even though we like to maintain semantic compatibility, the
fact that protobuf 2.5 doesn't link against classes generated by protoc 2.4
means that we are fighting a losing battle here.
bq. , modularity,
what exactly do you mean here?
bq. updated dependencies
see HADOOP-9991
bq. marking dependencies "optional",
This is why we are splitting things like having a separate
{{hadoop-hdfs-client}} from the server side code.
bq. relying on user-defined classpath at runtime, etc.,
Requires fundamental changes in both build time and runtime isolation in the
JVM and its toolchain. We're actually looking forward to Java 9 here — keep an
eye on HADOOP-11123
bq. as an alternative to shading/bundling
we are not fans of shading a—we recognise its fundamental wrongness, as well as
adverse consequences, both in maintenance/admin ("does this aggregate/shaded
app include something insecure/license incompatible), and in unwanted side
effects (a recentl example HADOOP-14138). But right now we don't have any way
to stop changes in Hadoop's dependencies from breaking things downstream, we
pull in so many things server side, and that need to avoid breaking things is
constraining what we can do. Minimising changes hampers our ability to use the
best tools from others; being aggressive about dependencies will destroy the
well-being of everything downstream.
As noted, this is about client side. Server-side: not bundled/shaded, same as
it ever was. You can even skip the shading by building with -DskipShade.
Downstream apps, such as HBase, will pick up the shaded things: you will have
to build the shaded things and give it to them. That's the only way we can
decouple their dependencies within the constraints of Java's current isolation
model. Java 9 will change this, hopefully for the better.
> Classpath isolation for downstream clients
> ------------------------------------------
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: Sean Busbey
> Assignee: Sean Busbey
> Priority: Blocker
> Labels: classloading, classpath, dependencies, scripts, shell
> Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party
> libraries. As our code base grows and matures we increase the set of
> libraries we rely on. At the same time, as our user base grows we increase
> the likelihood that some downstream project will run into a conflict while
> attempting to use a different version of some library we depend on. This has
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to
> off and they don't do anything to help dependency conflicts on the driver
> side or for folks talking to HDFS directly. This should serve as an umbrella
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when
> executing user provided code, whether client side in a launcher/driver or on
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want
> to run substantially ahead or behind the versions we need and the project is
> freer to change our own dependency versions because they'll no longer be in
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases
> written in the comments.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]