[ 
https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995815#comment-13995815
 ] 

Sean Owen commented on SPARK-1802:
----------------------------------

I looked further into just what might go wrong by including hive-exec into the 
assembly, since it includes its dependencies directly (i.e. Maven can't manage 
around it.)

Attached is a full dump of the conflicts.

The ones that are potential issues appear to be the following, and one looks 
like it could be a deal-breaker -- protobuf -- since it's neither forwards nor 
backwards compatible. That is, I recommend testing this assembly with an older 
Hadoop that needs 2.4.1 and see if it croaks.

The rest might be worked around but need some additional mojo to make sure the 
right version wins in the packaging.

Certainly having hive-exec in the build is making me queasy!


[WARNING] hive-exec-0.12.0.jar, libthrift-0.9.0.jar define 153 overlappping 
classes: 

HBase includes libthrift-0.8.0, but it's in examples, and so figure this is 
ignorable.


[WARNING] hive-exec-0.12.0.jar, commons-lang-2.4.jar define 2 overlappping 
classes: 

Probably ignorable, but we have to make sure commons-lang-3.3.2 'wins' in the 
build.


[WARNING] hive-exec-0.12.0.jar, jackson-core-asl-1.9.11.jar define 117 
overlappping classes: 
[WARNING] hive-exec-0.12.0.jar, jackson-mapper-asl-1.8.8.jar define 432 
overlappping classes: 

Believe this are ignorable. (Not sure why the jackson versions are mismatched? 
another todo)


[WARNING] hive-exec-0.12.0.jar, guava-14.0.1.jar define 1087 overlappping 
classes: 

Should be OK. Hive uses 11.0.2 like Hadoop; the build is already taking that 
particular risk. We need 14.0.1 to win.


[WARNING] hive-exec-0.12.0.jar, protobuf-java-2.4.1.jar define 204 overlappping 
classes: 

Oof. Hive has protobuf 2.5.0. This has got to be a problem for older Hadoop 
builds?



> Audit dependency graph when Spark is built with -Phive
> ------------------------------------------------------
>
>                 Key: SPARK-1802
>                 URL: https://issues.apache.org/jira/browse/SPARK-1802
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Patrick Wendell
>            Assignee: Sean Owen
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> I'd like to have binary release for 1.0 include Hive support. Since this 
> isn't enabled by default in the build I don't think it's as well tested, so 
> we should dig around a bit and decide if we need to e.g. add any excludes.
> {code}
> $ mvn install -Phive -DskipTests && mvn dependency:build-classpath -pl 
> assembly | grep -v INFO | tr ":" "\n" |  awk ' { FS="/"; print ( $(NF) ); }' 
> | sort > without_hive.txt
> $ mvn install -Phive -DskipTests && mvn dependency:build-classpath -Phive -pl 
> assembly | grep -v INFO | tr ":" "\n" |  awk ' { FS="/"; print ( $(NF) ); }' 
> | sort > with_hive.txt
> $ diff without_hive.txt with_hive.txt
> < antlr-2.7.7.jar
> < antlr-3.4.jar
> < antlr-runtime-3.4.jar
> 10,14d6
> < avro-1.7.4.jar
> < avro-ipc-1.7.4.jar
> < avro-ipc-1.7.4-tests.jar
> < avro-mapred-1.7.4.jar
> < bonecp-0.7.1.RELEASE.jar
> 22d13
> < commons-cli-1.2.jar
> 25d15
> < commons-compress-1.4.1.jar
> 33,34d22
> < commons-logging-1.1.1.jar
> < commons-logging-api-1.0.4.jar
> 38d25
> < commons-pool-1.5.4.jar
> 46,49d32
> < datanucleus-api-jdo-3.2.1.jar
> < datanucleus-core-3.2.2.jar
> < datanucleus-rdbms-3.2.1.jar
> < derby-10.4.2.0.jar
> 53,57d35
> < hive-common-0.12.0.jar
> < hive-exec-0.12.0.jar
> < hive-metastore-0.12.0.jar
> < hive-serde-0.12.0.jar
> < hive-shims-0.12.0.jar
> 60,61d37
> < httpclient-4.1.3.jar
> < httpcore-4.1.3.jar
> 68d43
> < JavaEWAH-0.3.2.jar
> 73d47
> < javolution-5.5.1.jar
> 76d49
> < jdo-api-3.0.1.jar
> 78d50
> < jetty-6.1.26.jar
> 87d58
> < jetty-util-6.1.26.jar
> 93d63
> < json-20090211.jar
> 98d67
> < jta-1.1.jar
> 103,104d71
> < libfb303-0.9.0.jar
> < libthrift-0.9.0.jar
> 112d78
> < mockito-all-1.8.5.jar
> 136d101
> < servlet-api-2.5-20081211.jar
> 139d103
> < snappy-0.2.jar
> 144d107
> < spark-hive_2.10-1.0.0.jar
> 151d113
> < ST4-4.0.4.jar
> 153d114
> < stringtemplate-3.2.1.jar
> 156d116
> < velocity-1.7.jar
> 158d117
> < xz-1.0.jar
> {code}
> Some initial investigation suggests we may need to take some precaution 
> surrounding (a) jetty and (b) servlet-api.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to