[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723802#comment-14723802
 ] 

Marcelo Vanzin commented on SPARK-10374:
----------------------------------------

BTW since akka depends on protobuf, one cannot simply override the dependency, 
since then akka might break. Does anyone know what's the extent of akka's use 
of protobuf?

This does sounds pretty bad (it may make Spark's hadoop-1 builds unusable, at 
least in certain situations).

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-10374
>                 URL: https://issues.apache.org/jira/browse/SPARK-10374
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> <redacted other messages>… java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
>         at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
>         at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.4.1
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.4.1
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |                   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>      \--- org.apache.spark:spark-core_2.10:1.4.1
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.4.1
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>                \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>      \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>                \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |                   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to