I agree we should get serialization addressed in TinkerPop. I'm still a bit surprised that Spark had this problem, considering its popularity, so it proves there are idiots everywhere ;) On Fri, Jan 29, 2016 at 12:10 PM Marko Rodriguez <okramma...@gmail.com> wrote:
> Hi Jason, > > > In the meantime, it sounds like you have to match the compiled Spark > > version with the runtime. I saw a bunch of posts and a couple JIRA where > > they always came back to that as the solution. > > So whats the deal for us? I say we release with Spark 1.5.2 as its a minor > bump and if there is a "jar swap" trick that works for people, thats that. > > > Wonder how exposed TinkerPop is with Serializable and serialVersionUIDs. > > Dan LaRocque was basically saying we are idiots for not using > serialVersionIDs. I didn't even know what that was all about until he told > me. I think we DEFINITELY need to get that solid for 3.2.0. > > Thoughts?, > Marko. > > > > On Thu, Jan 28, 2016 at 4:10 PM, Jason Plurad <plur...@gmail.com> wrote: > > > >> Yeah, I was surprised about the incompatibility. It seems contained to > the > >> standalone Spark server deployment only. > >> > >> You can reproduce the same stack trace with their Spark Pi example on > >> standalone Spark servers (try to run Pi from 1.5.2 on a 1.5.1 > standalone, > >> or Pi 1.5.1 on a 1.5.2 standalone). > >> > >> yarn-client and local tested out fine. > >> > >> I'll post out on the Spark list and see what they come back with. > >> > >> > >> On Thu, Jan 28, 2016 at 3:51 PM, Marko Rodriguez <okramma...@gmail.com> > >> wrote: > >> > >>> Hello, > >>> > >>> This is odd. We are currently doing TinkerPop 3.1.1-SNAPSHOT + Spark > >>> 1.5.2 2-billion edge benchmarking (against SparkServer) and all is > good. > >>> > >>> Are you saying that Spark 1.5.1 and Spark 1.5.2 are incompatible? > Thats a > >>> bummer. > >>> > >>> I don't think there is an "official policy," but I always bump minor > >>> release versions with minor release versions. That is, I didn't bump to > >>> Spark 1.6.0 (we will do that for TinkerPop 3.2.0), but since 1.5.1 is > minor > >>> to 1.5.2, I bumped. We have always done that -- e.g. Neo4j, Hadoop, > various > >>> Java libraries… > >>> > >>> Thoughts?, > >>> Marko. > >>> > >>> http://markorodriguez.com > >>> > >>> On Jan 28, 2016, at 1:48 PM, Jason Plurad <plur...@gmail.com> wrote: > >>> > >>>> We're running into this error with standalone Spark clusters > >>>> <http://spark.apache.org/docs/1.5.2/spark-standalone.html>. > >>>> > >>>> ``` > >>>> WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in > >>> stage > >>>> 0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException: > >>>> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc > >>>> serialVersionUID = -3343649307726848892, local class serialVersionUID > = > >>>> -3996494161745401652 > >>>> at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) > >>>> at > >>>> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > >>>> at > >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > >>>> at > >>>> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) > >>>> at > >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > >>>> at > >>>> > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > >>>> at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > >>>> at > >>>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > >>>> at > >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > >>>> at > >>>> > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > >>>> at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > >>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > >>>> at > >>>> > >>> > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) > >>>> at > >>>> > >>> > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) > >>>> at > >>>> > >>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) > >>>> at > >>>> > >>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > >>>> at org.apache.spark.scheduler.Task.run(Task.scala:88) > >>>> at > >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > >>>> at > >>>> > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > >>>> at > >>>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > >>>> at java.lang.Thread.run(Thread.java:745) > >>>> ``` > >>>> > >>>> You can reproduce this error 2 ways: > >>>> * Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a > >>> Spark > >>>> 1.5.2 standalone cluster > >>>> * Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a > Spark > >>>> 1.5.1 standalone cluster > >>>> > >>>> Only standalone Spark cluster gets broken -- the Spark cluster version > >>> must > >>>> be matched exactly with what TinkerPop is built against. > >>>> > >>>> This commit > >>>> < > >>> > https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c > >>>> > >>>> bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned, > it > >>>> does pass the unit tests, but the unit tests are run with > >>>> `spark.master=local`. I've tested that it also works with > >>>> `spark.master=yarn-client`. > >>>> > >>>> What is -- or rather, what should be -- the direction/policy for > >>> dependency > >>>> version upgrades in TinkerPop? > >>>> > >>>> -- Jason > >>> > >>> > >> > >