[jira] [Comment Edited] (SPARK-1802) Audit dependency graph when Spark is built with -Phive

Patrick Wendell (JIRA) Tue, 13 May 2014 01:45:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996178#comment-13996178
 ]


Patrick Wendell edited comment on SPARK-1802 at 5/13/14 8:18 AM:
-----------------------------------------------------------------

This protobuf thing is very troubling. The options here are pretty limited 
since they publish this assembly jar. I see a few:

1. Publish a Hive 0.12 that uses our shaded protobuf 2.4.1 (we already 
published a shaded version of protobuf 2.4.1). I actually have this working in 
a local build of Hive 0.12, but I haven't tried to push it to sonatype yet:
https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf

2. Upgrade our use of hive to 0.13 (which bumps to protobuf 2.5.0) and only 
support Spark SQL with Hadoop 2+ - that is, versions of Hadoop that have also 
bumped to protobuf 2.5.0. I'm not sure how big of an effort that would be in 
terms of the code changes between 0.12 and 0.13. Spark didn't recompile 
trivially. I can talk to Michael Armbrust tomorrow morning about this.

One thing I don't totally understand is how Hive itself deals with this 
conflict. For instance, if someone wants to run Hive 0.12 with Hadoop 2. 
Presumably both the Hive protobuf 2.4.1 and the HDFS client protobuf 2.5.0 will 
be in the JVM at the same time... I'm not sure how they are isolated from 
each-other. HDP 2.1 for instance, seems to have both 
(http://hortonworks.com/hdp/whats-new/)


was (Author: pwendell):
This protobuf thing is very troubling. The options here are pretty limited 
since they publish this assembly jar. I see a few:

1. Publish a Hive 0.12 that users our shaded protobuf 2.4.1 (we already 
published a shaded version of protobuf 2.4.1). I actually have this working in 
a local build of Hive 0.12, but I haven't tried to push it to sonatype yet:
https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf

2. Upgrade our use of hive to 0.13 (which bumps to protobuf 2.5.0) and only 
support Spark SQL with Hadoop 2+ - that is, versions of Hadoop that have also 
bumped to protobuf 2.5.0. I'm not sure how big of an effort that would be in 
terms of the code changes between 0.12 and 0.13. Spark didn't recompile 
trivially. I can talk to Michael Armbrust tomorrow morning about this.

One thing I don't totally understand is how Hive itself deals with this 
conflict. For instance, if someone wants to run Hive 0.12 with Hadoop 2. 
Presumably both the Hive protobuf 2.4.1 and the HDFS client protobuf 2.5.0 will 
be in the JVM at the same time... I'm not sure how they are isolated from 
each-other. HDP 2.1 for instance, seems to have both 
(http://hortonworks.com/hdp/whats-new/)

> Audit dependency graph when Spark is built with -Phive
> ------------------------------------------------------
>
>                 Key: SPARK-1802
>                 URL: https://issues.apache.org/jira/browse/SPARK-1802
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Patrick Wendell
>            Assignee: Sean Owen
>            Priority: Blocker
>             Fix For: 1.0.0
>
>         Attachments: hive-exec-jar-problems.txt
>
>
> I'd like to have binary release for 1.0 include Hive support. Since this 
> isn't enabled by default in the build I don't think it's as well tested, so 
> we should dig around a bit and decide if we need to e.g. add any excludes.
> {code}
> $ mvn install -Phive -DskipTests && mvn dependency:build-classpath -pl 
> assembly | grep -v INFO | tr ":" "\n" |  awk ' { FS="/"; print ( $(NF) ); }' 
> | sort > without_hive.txt
> $ mvn install -Phive -DskipTests && mvn dependency:build-classpath -Phive -pl 
> assembly | grep -v INFO | tr ":" "\n" |  awk ' { FS="/"; print ( $(NF) ); }' 
> | sort > with_hive.txt
> $ diff without_hive.txt with_hive.txt
> < antlr-2.7.7.jar
> < antlr-3.4.jar
> < antlr-runtime-3.4.jar
> 10,14d6
> < avro-1.7.4.jar
> < avro-ipc-1.7.4.jar
> < avro-ipc-1.7.4-tests.jar
> < avro-mapred-1.7.4.jar
> < bonecp-0.7.1.RELEASE.jar
> 22d13
> < commons-cli-1.2.jar
> 25d15
> < commons-compress-1.4.1.jar
> 33,34d22
> < commons-logging-1.1.1.jar
> < commons-logging-api-1.0.4.jar
> 38d25
> < commons-pool-1.5.4.jar
> 46,49d32
> < datanucleus-api-jdo-3.2.1.jar
> < datanucleus-core-3.2.2.jar
> < datanucleus-rdbms-3.2.1.jar
> < derby-10.4.2.0.jar
> 53,57d35
> < hive-common-0.12.0.jar
> < hive-exec-0.12.0.jar
> < hive-metastore-0.12.0.jar
> < hive-serde-0.12.0.jar
> < hive-shims-0.12.0.jar
> 60,61d37
> < httpclient-4.1.3.jar
> < httpcore-4.1.3.jar
> 68d43
> < JavaEWAH-0.3.2.jar
> 73d47
> < javolution-5.5.1.jar
> 76d49
> < jdo-api-3.0.1.jar
> 78d50
> < jetty-6.1.26.jar
> 87d58
> < jetty-util-6.1.26.jar
> 93d63
> < json-20090211.jar
> 98d67
> < jta-1.1.jar
> 103,104d71
> < libfb303-0.9.0.jar
> < libthrift-0.9.0.jar
> 112d78
> < mockito-all-1.8.5.jar
> 136d101
> < servlet-api-2.5-20081211.jar
> 139d103
> < snappy-0.2.jar
> 144d107
> < spark-hive_2.10-1.0.0.jar
> 151d113
> < ST4-4.0.4.jar
> 153d114
> < stringtemplate-3.2.1.jar
> 156d116
> < velocity-1.7.jar
> 158d117
> < xz-1.0.jar
> {code}
> Some initial investigation suggests we may need to take some precaution 
> surrounding (a) jetty and (b) servlet-api.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (SPARK-1802) Audit dependency graph when Spark is built with -Phive

Reply via email to