[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

Marcelo Vanzin (JIRA) Mon, 28 Jul 2014 10:56:30 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076456#comment-14076456
 ]


Marcelo Vanzin commented on SPARK-2420:
---------------------------------------

So let me see if I'm following things so far. The current proposals are 1. 
downgrade or 2. "shade" (which if I understand Patrick correctly means forking 
Guava and changing the sources to a different package, not using the maven 
shade plugin?).

Both options avoid overriding libraries used by Hadoop; the first by using the 
same one, the second by avoiding the namespace conflict.

Option 1 provides less backwards compatibility issues. Shading just removes 
Guava from the user's classpath, so it leaves users to manage it; they'll 
either inherit it from Hadoop, or get into a situation where they override the 
classpath's Guava with their own, and potentially might break Hadoop. For both 
cases, I think the best recommendation is to tell the user to shade Guava in 
their application if they really need a newer version - that way they won't be 
overriding the library used by Hadoop classes.

Option 1 is also less work; you don't need to maintain the shaded Guava (if I 
understand correctly what was meant here by shading). Using maven's shade 
instead means builds would get slower.

Also, does anyone have an idea about whether any of the libraries Spark depends 
on depend on Guava and need a version later than 11? I haven't checked that.

As for Guava leaking through Spark's API that's very, very unfortunate. Option 
2 here will definitely break compatibility for anyone who uses those APIs. 
Option 1, on the other hand, has only a couple of implications: according to 
Guava's javadoc, only one method doesn't exist in 11 ({{transform}}) and one 
has a changed signature ({{presentInstances}}, and only generic arguments were 
changed, so maybe still binary compatible).

So, pending my dependency question above, I still think that downgrading is the 
option that creates less headaches.

> Change Spark build to minimize library conflicts
> ------------------------------------------------
>
>                 Key: SPARK-2420
>                 URL: https://issues.apache.org/jira/browse/SPARK-2420
>             Project: Spark
>          Issue Type: Wish
>          Components: Build
>    Affects Versions: 1.0.0
>            Reporter: Xuefu Zhang
>         Attachments: spark_1.0.0.patch
>
>
> During the prototyping of HIVE-7292, many library conflicts showed up because 
> Spark build contains versions of libraries that's vastly different from 
> current major Hadoop version. It would be nice if we can choose versions 
> that's in line with Hadoop or shading them in the assembly. Here are the wish 
> list:
> 1. Upgrade protobuf version to 2.5.0 from current 2.4.1
> 2. Shading Spark's jetty and servlet dependency in the assembly.
> 3. guava version difference. Spark is using a higher version. I'm not sure 
> what's the best solution for this.
> The list may grow as HIVE-7292 proceeds.
> For information only, the attached is a patch that we applied on Spark in 
> order to make Spark work with Hive. It gives an idea of the scope of changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

Reply via email to