[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

Sean Owen (JIRA) Sun, 27 Jul 2014 14:02:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075755#comment-14075755
 ]


Sean Owen commented on SPARK-2420:
----------------------------------

There aren't great answers to this one. I also ended up favoring downgrading as 
a path of least resistance. Here is the narrative behind my opinion:


This did come up as an issue when Guava was upgraded to 14 :)

It seems annoying that a dependency dictates a version of Guava, but c'est la 
vie for any dependency. It just happens that Guava is so common. 

Spark users are inevitably Hadoop users, so it's a dependency that exerts 
special influence.

I think this is being improved upstream in Hadoop, by shading, but, that 
doesn't help existing versions in the field, which will be around for years.

It is causing actual problems for users, and for future efforts that are 
probably important to Spark, such as Hive on Spark here.

Downgrading looks feasible. See my PR: 
https://github.com/apache/spark/pull/1610 *It does need review!*

Downgrading could break Spark apps that depend on it depending on Guava 12+. 
But this is really a problem with such an app though, as it should depend on 
Guava directly. But still, a point to consider.

Can one justify a down-grade between a dependency between 1.x and 1.(x+1)? I 
think so if you view it as more a bug fix.

But why can't Spark shade Guava? This is also reasonable to consider. 

If you're worried about breaking apps, that's a more breaking change though, 
and I understand not-breaking apps is high priority. Apps who rely on Guava 
transitively might continue to work just fine otherwise, but not if it 
disappears from Spark.

Shading is always a bit risky, as it can't always adjust all use of reflection 
or other reliance on package names in the library. You can end up with two 
copies of singleton classes of course, if someone else brings their own Guava, 
which might or might not be OK. I don't have a specific problem in mind for 
Guava, though.

A more significant reason is that I'm still not 100% sure shading in Spark 
fixes the collision, in stand-alone mode at least. Spark apps who bring Guava 
14 may still collide with Hadoop's classpath, containing 11.

> Change Spark build to minimize library conflicts
> ------------------------------------------------
>
>                 Key: SPARK-2420
>                 URL: https://issues.apache.org/jira/browse/SPARK-2420
>             Project: Spark
>          Issue Type: Wish
>          Components: Build
>    Affects Versions: 1.0.0
>            Reporter: Xuefu Zhang
>         Attachments: spark_1.0.0.patch
>
>
> During the prototyping of HIVE-7292, many library conflicts showed up because 
> Spark build contains versions of libraries that's vastly different from 
> current major Hadoop version. It would be nice if we can choose versions 
> that's in line with Hadoop or shading them in the assembly. Here are the wish 
> list:
> 1. Upgrade protobuf version to 2.5.0 from current 2.4.1
> 2. Shading Spark's jetty and servlet dependency in the assembly.
> 3. guava version difference. Spark is using a higher version. I'm not sure 
> what's the best solution for this.
> The list may grow as HIVE-7292 proceeds.
> For information only, the attached is a patch that we applied on Spark in 
> order to make Spark work with Hive. It gives an idea of the scope of changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

Reply via email to