[
https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076456#comment-14076456
]
Marcelo Vanzin commented on SPARK-2420:
---------------------------------------
So let me see if I'm following things so far. The current proposals are 1.
downgrade or 2. "shade" (which if I understand Patrick correctly means forking
Guava and changing the sources to a different package, not using the maven
shade plugin?).
Both options avoid overriding libraries used by Hadoop; the first by using the
same one, the second by avoiding the namespace conflict.
Option 1 provides less backwards compatibility issues. Shading just removes
Guava from the user's classpath, so it leaves users to manage it; they'll
either inherit it from Hadoop, or get into a situation where they override the
classpath's Guava with their own, and potentially might break Hadoop. For both
cases, I think the best recommendation is to tell the user to shade Guava in
their application if they really need a newer version - that way they won't be
overriding the library used by Hadoop classes.
Option 1 is also less work; you don't need to maintain the shaded Guava (if I
understand correctly what was meant here by shading). Using maven's shade
instead means builds would get slower.
Also, does anyone have an idea about whether any of the libraries Spark depends
on depend on Guava and need a version later than 11? I haven't checked that.
As for Guava leaking through Spark's API that's very, very unfortunate. Option
2 here will definitely break compatibility for anyone who uses those APIs.
Option 1, on the other hand, has only a couple of implications: according to
Guava's javadoc, only one method doesn't exist in 11 ({{transform}}) and one
has a changed signature ({{presentInstances}}, and only generic arguments were
changed, so maybe still binary compatible).
So, pending my dependency question above, I still think that downgrading is the
option that creates less headaches.
> Change Spark build to minimize library conflicts
> ------------------------------------------------
>
> Key: SPARK-2420
> URL: https://issues.apache.org/jira/browse/SPARK-2420
> Project: Spark
> Issue Type: Wish
> Components: Build
> Affects Versions: 1.0.0
> Reporter: Xuefu Zhang
> Attachments: spark_1.0.0.patch
>
>
> During the prototyping of HIVE-7292, many library conflicts showed up because
> Spark build contains versions of libraries that's vastly different from
> current major Hadoop version. It would be nice if we can choose versions
> that's in line with Hadoop or shading them in the assembly. Here are the wish
> list:
> 1. Upgrade protobuf version to 2.5.0 from current 2.4.1
> 2. Shading Spark's jetty and servlet dependency in the assembly.
> 3. guava version difference. Spark is using a higher version. I'm not sure
> what's the best solution for this.
> The list may grow as HIVE-7292 proceeds.
> For information only, the attached is a patch that we applied on Spark in
> order to make Spark work with Hive. It gives an idea of the scope of changes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)