[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

Patrick Wendell (JIRA) Sun, 27 Jul 2014 14:15:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075758#comment-14075758
 ]


Patrick Wendell commented on SPARK-2420:
----------------------------------------

I put some thought into this as well. One big issue (and this was frankly a 
mistake in Spark's Java API design) is that we expose guava's Optional type in 
Spark's Java API. In general we should avoid relying on external types in any 
of our API's - that decision was made a long time ago when we were a much 
smaller project.

The reason why downgrading is bad for user applications is that it's not 
something they can just "work around" by declaring a newer version of Guava in 
their build. The whole issue here is that Guava 11 and 14 are not binary 
compatible. I.e. if user code depends on Guava 14, and that gets pulled in, 
then Spark will break. So users will actually have to roll back their source 
code as well if it depends on newer Guava features. This is very disruptive 
from a user perspective and I think it's tantamount to an API change, since 
users will have to re-write code. It's in some ways worse than a Spark API 
change, because we can't easily write a "downgrade guide" of Guava from 14 to 
11 (there will simply be missing features).

I think the best solution here is to shade guava. And by shade I mean actually 
re-publish Guava under the org.spark-project namespace as we have done with a 
few other critical dependencies, and then depend on that int he spark build. 
This is much better than using something like the maven shade plug-in which is 
more of a hack.

Then the issue is our Java API, because that currently exposes the Guava 
Optional class directly under it's original namespace. I see two options. (i) 
Change Spark's API to return a Spark-specific optional class. (ii) Inline the 
definition of Guava's Optional (under its original namespace) in Spark's source 
code - it's a very simple class and has been stable across several versions of 
Guava.

The only risk with (ii) is that if Guava makes an incompatible change to 
Optional, we are in trouble. If that happens, we could always fall back to (i) 
though in a future release.






> Change Spark build to minimize library conflicts
> ------------------------------------------------
>
>                 Key: SPARK-2420
>                 URL: https://issues.apache.org/jira/browse/SPARK-2420
>             Project: Spark
>          Issue Type: Wish
>          Components: Build
>    Affects Versions: 1.0.0
>            Reporter: Xuefu Zhang
>         Attachments: spark_1.0.0.patch
>
>
> During the prototyping of HIVE-7292, many library conflicts showed up because 
> Spark build contains versions of libraries that's vastly different from 
> current major Hadoop version. It would be nice if we can choose versions 
> that's in line with Hadoop or shading them in the assembly. Here are the wish 
> list:
> 1. Upgrade protobuf version to 2.5.0 from current 2.4.1
> 2. Shading Spark's jetty and servlet dependency in the assembly.
> 3. guava version difference. Spark is using a higher version. I'm not sure 
> what's the best solution for this.
> The list may grow as HIVE-7292 proceeds.
> For information only, the attached is a patch that we applied on Spark in 
> order to make Spark work with Hive. It gives an idea of the scope of changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

Reply via email to