[ https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-10374: ------------------------------ Priority: Major (was: Blocker) Fix Version/s: (was: 1.5.0) ([~mcheah] don't set Fix version or Blocker: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) I think the more basic issue is that you have a build for Hadoop 2.2+ and are using 2.0. The artifacts in Maven won't necessarily work for you. You need something like the cdh4 profile and a custom build ... but here it's the akka dependency that would need a custom build I think. Also I'm not clear you marked the dependencies as "provided"? although I don't know that's the issue. TBH I don't know if Spark necessarily works with Hadoop 2.0.0. 1.4 didn't fully work with 1.x. > Spark-core 1.5.0-RC2 can create version conflicts with apps depending on > protobuf-2.4 > ------------------------------------------------------------------------------------- > > Key: SPARK-10374 > URL: https://issues.apache.org/jira/browse/SPARK-10374 > Project: Spark > Issue Type: Bug > Affects Versions: 1.5.0 > Reporter: Matt Cheah > > My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that > depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. > When I run the driver application, I can hit the following error: > {code} > <redacted other messages>… java.lang.UnsupportedOperationException: This is > supposed to be overridden by subclasses. > at > com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108) > at > com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149) > {code} > This application used to work when pulling in Spark 1.4.1 dependencies, and > thus this is a regression. > I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark > 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf > 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark > modules. It appears that Spark used to shade its protobuf dependencies and > hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However > when I ran dependencyInsight again against Spark 1.5 and it looks like > protobuf is no longer shaded from the Spark module. > 1.4.1 dependencyInsight: > {code} > com.google.protobuf:protobuf-java:2.4.0a > +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0 > | \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 > | +--- compile > | \--- org.apache.spark:spark-core_2.10:1.4.1 > | +--- compile > | +--- org.apache.spark:spark-sql_2.10:1.4.1 > | | \--- compile > | \--- org.apache.spark:spark-catalyst_2.10:1.4.1 > | \--- org.apache.spark:spark-sql_2.10:1.4.1 (*) > \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0 > \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*) > org.spark-project.protobuf:protobuf-java:2.5.0-spark > \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark > \--- org.apache.spark:spark-core_2.10:1.4.1 > +--- compile > +--- org.apache.spark:spark-sql_2.10:1.4.1 > | \--- compile > \--- org.apache.spark:spark-catalyst_2.10:1.4.1 > \--- org.apache.spark:spark-sql_2.10:1.4.1 (*) > {code} > 1.5.0-rc2 dependencyInsight: > {code} > com.google.protobuf:protobuf-java:2.5.0 (conflict resolution) > \--- com.typesafe.akka:akka-remote_2.10:2.3.11 > \--- org.apache.spark:spark-core_2.10:1.5.0-rc2 > +--- compile > +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 > | \--- compile > \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2 > \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*) > com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0 > +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0 > | \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 > | +--- compile > | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2 > | +--- compile > | +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 > | | \--- compile > | \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2 > | \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*) > \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0 > \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*) > {code} > Clearly we can't force the version to be one way or the other. If I force > protobuf to use 2.5.0, then invoking Hadoop code from my application will > break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other > hand, forcing protobuf to use version 2.4 breaks spark-core code that is > compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are > not binary compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org