[
https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075758#comment-14075758
]
Patrick Wendell commented on SPARK-2420:
----------------------------------------
I put some thought into this as well. One big issue (and this was frankly a
mistake in Spark's Java API design) is that we expose guava's Optional type in
Spark's Java API. In general we should avoid relying on external types in any
of our API's - that decision was made a long time ago when we were a much
smaller project.
The reason why downgrading is bad for user applications is that it's not
something they can just "work around" by declaring a newer version of Guava in
their build. The whole issue here is that Guava 11 and 14 are not binary
compatible. I.e. if user code depends on Guava 14, and that gets pulled in,
then Spark will break. So users will actually have to roll back their source
code as well if it depends on newer Guava features. This is very disruptive
from a user perspective and I think it's tantamount to an API change, since
users will have to re-write code. It's in some ways worse than a Spark API
change, because we can't easily write a "downgrade guide" of Guava from 14 to
11 (there will simply be missing features).
I think the best solution here is to shade guava. And by shade I mean actually
re-publish Guava under the org.spark-project namespace as we have done with a
few other critical dependencies, and then depend on that int he spark build.
This is much better than using something like the maven shade plug-in which is
more of a hack.
Then the issue is our Java API, because that currently exposes the Guava
Optional class directly under it's original namespace. I see two options. (i)
Change Spark's API to return a Spark-specific optional class. (ii) Inline the
definition of Guava's Optional (under its original namespace) in Spark's source
code - it's a very simple class and has been stable across several versions of
Guava.
The only risk with (ii) is that if Guava makes an incompatible change to
Optional, we are in trouble. If that happens, we could always fall back to (i)
though in a future release.
> Change Spark build to minimize library conflicts
> ------------------------------------------------
>
> Key: SPARK-2420
> URL: https://issues.apache.org/jira/browse/SPARK-2420
> Project: Spark
> Issue Type: Wish
> Components: Build
> Affects Versions: 1.0.0
> Reporter: Xuefu Zhang
> Attachments: spark_1.0.0.patch
>
>
> During the prototyping of HIVE-7292, many library conflicts showed up because
> Spark build contains versions of libraries that's vastly different from
> current major Hadoop version. It would be nice if we can choose versions
> that's in line with Hadoop or shading them in the assembly. Here are the wish
> list:
> 1. Upgrade protobuf version to 2.5.0 from current 2.4.1
> 2. Shading Spark's jetty and servlet dependency in the assembly.
> 3. guava version difference. Spark is using a higher version. I'm not sure
> what's the best solution for this.
> The list may grow as HIVE-7292 proceeds.
> For information only, the attached is a patch that we applied on Spark in
> order to make Spark work with Hive. It gives an idea of the scope of changes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)