[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

Steve Loughran (JIRA) Tue, 18 Apr 2017 15:16:07 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973613#comment-15973613
 ]


Steve Loughran commented on HADOOP-11656:
-----------------------------------------

[~ctubbsii]: this is about client side classpath dependencies, not server. If 
you want to know why, look at HADOOP-10101 to see coverage of just one JAR, 
then consider also Jackson 1.x, jackson 2.x, jersey, and other widely used 
things. The ones which cause the most problems are those for IPC: protobuf, 
avro, where the generated code has to be in perfect sync with with the version 
of classes generated by the protoc compiler and compiled into the archives.

bq. Has the upstream Hadoop community considered other possible options, 

yes

bq. such as better semantic versioning

requires fundamental change across the entire java stack, doesn't handle the 
problems of a downstream app wanting to use a version of protobuf incompatible 
with the version Hadoop's generated classes depend on, etc. Also the whole 
notion of "semantically compatible" is one we could discuss for a long time. 
Suffice to say, even though we like to maintain semantic compatibility, the 
fact that protobuf 2.5 doesn't link against classes generated by protoc 2.4 
means that we are fighting a losing battle here.

bq. , modularity, 

what exactly do you mean here?

bq. updated dependencies

see HADOOP-9991

bq. marking dependencies "optional",

This is why we are splitting things like having a separate 
{{hadoop-hdfs-client}} from the server side code.

bq. relying on user-defined classpath at runtime, etc., 

Requires fundamental changes in both build time and runtime isolation in the 
JVM and its toolchain. We're actually looking forward to Java 9 here — keep an 
eye on HADOOP-11123

bq. as an alternative to shading/bundling

we are not fans of shading a—we recognise its fundamental wrongness, as well as 
adverse consequences, both in maintenance/admin ("does this aggregate/shaded 
app include something insecure/license incompatible), and in unwanted side 
effects (a recentl example HADOOP-14138). But right now we don't have any way 
to stop changes in Hadoop's dependencies from breaking things downstream, we 
pull in so many things server side, and that need to avoid breaking things is 
constraining what we can do. Minimising changes hampers our ability to use the 
best tools from others; being aggressive about dependencies will destroy the 
well-being of everything downstream.

As noted, this is about client side. Server-side: not bundled/shaded, same as 
it ever was. You can even skip the shading by building with -DskipShade. 
Downstream apps, such as HBase, will pick up the shaded things: you will have 
to build the shaded things and give it to them. That's the only way we can 
decouple their dependencies within the constraints of Java's current isolation 
model. Java 9 will change this, hopefully for the better.



> Classpath isolation for downstream clients
> ------------------------------------------
>
>                 Key: HADOOP-11656
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11656
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Blocker
>              Labels: classloading, classpath, dependencies, scripts, shell
>         Attachments: HADOOP-11656_proposal.md
>
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

Reply via email to