Re: spark standalone cluster 1.5.2

Jason Plurad Fri, 29 Jan 2016 09:32:23 -0800

I agree we should get serialization addressed in TinkerPop. I'm still a bit
surprised that Spark had this problem, considering its popularity, so it
proves there are idiots everywhere ;)
On Fri, Jan 29, 2016 at 12:10 PM Marko Rodriguez <okramma...@gmail.com>
wrote:


> Hi Jason,
>
> > In the meantime, it sounds like you have to match the compiled Spark
> > version with the runtime. I saw a bunch of posts and a couple JIRA where
> > they always came back to that as the solution.
>
> So whats the deal for us? I say we release with Spark 1.5.2 as its a minor
> bump and if there is a "jar swap" trick that works for people, thats that.
>
> > Wonder how exposed TinkerPop is with Serializable and serialVersionUIDs.
>
> Dan LaRocque was basically saying we are idiots for not using
> serialVersionIDs. I didn't even know what that was all about until he told
> me. I think we DEFINITELY need to get that solid for 3.2.0.
>
> Thoughts?,
> Marko.
>
>
> > On Thu, Jan 28, 2016 at 4:10 PM, Jason Plurad <plur...@gmail.com> wrote:
> >
> >> Yeah, I was surprised about the incompatibility. It seems contained to
> the
> >> standalone Spark server deployment only.
> >>
> >> You can reproduce the same stack trace with their Spark Pi example on
> >> standalone Spark servers (try to run Pi from 1.5.2 on a 1.5.1
> standalone,
> >> or Pi 1.5.1 on a 1.5.2 standalone).
> >>
> >> yarn-client and local tested out fine.
> >>
> >> I'll post out on the Spark list and see what they come back with.
> >>
> >>
> >> On Thu, Jan 28, 2016 at 3:51 PM, Marko Rodriguez <okramma...@gmail.com>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> This is odd. We are currently doing TinkerPop 3.1.1-SNAPSHOT + Spark
> >>> 1.5.2 2-billion edge benchmarking (against SparkServer) and all is
> good.
> >>>
> >>> Are you saying that Spark 1.5.1 and Spark 1.5.2 are incompatible?
> Thats a
> >>> bummer.
> >>>
> >>> I don't think there is an "official policy," but I always bump minor
> >>> release versions with minor release versions. That is, I didn't bump to
> >>> Spark 1.6.0 (we will do that for TinkerPop 3.2.0), but since 1.5.1 is
> minor
> >>> to 1.5.2, I bumped. We have always done that -- e.g. Neo4j, Hadoop,
> various
> >>> Java libraries…
> >>>
> >>> Thoughts?,
> >>> Marko.
> >>>
> >>> http://markorodriguez.com
> >>>
> >>> On Jan 28, 2016, at 1:48 PM, Jason Plurad <plur...@gmail.com> wrote:
> >>>
> >>>> We're running into this error with standalone Spark clusters
> >>>> <http://spark.apache.org/docs/1.5.2/spark-standalone.html>.
> >>>>
> >>>> ```
> >>>> WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in
> >>> stage
> >>>> 0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException:
> >>>> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
> >>>> serialVersionUID = -3343649307726848892, local class serialVersionUID
> =
> >>>> -3996494161745401652
> >>>>   at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
> >>>>   at
> >>>>
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
> >>>>   at
> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> >>>>   at
> >>>>
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
> >>>>   at
> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> >>>>   at
> >>>>
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> >>>>   at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> >>>>   at
> >>>>
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> >>>>   at
> >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> >>>>   at
> >>>>
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> >>>>   at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> >>>>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >>>>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >>>>   at
> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >>>>   at
> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >>>>   at
> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >>>>   at java.lang.Thread.run(Thread.java:745)
> >>>> ```
> >>>>
> >>>> You can reproduce this error 2 ways:
> >>>> * Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a
> >>> Spark
> >>>> 1.5.2 standalone cluster
> >>>> * Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a
> Spark
> >>>> 1.5.1 standalone cluster
> >>>>
> >>>> Only standalone Spark cluster gets broken -- the Spark cluster version
> >>> must
> >>>> be matched exactly with what TinkerPop is built against.
> >>>>
> >>>> This commit
> >>>> <
> >>>
> https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c
> >>>>
> >>>> bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned,
> it
> >>>> does pass the unit tests, but the unit tests are run with
> >>>> `spark.master=local`. I've tested that it also works with
> >>>> `spark.master=yarn-client`.
> >>>>
> >>>> What is -- or rather, what should be -- the direction/policy for
> >>> dependency
> >>>> version upgrades in TinkerPop?
> >>>>
> >>>> -- Jason
> >>>
> >>>
> >>
>
>

Re: spark standalone cluster 1.5.2

Reply via email to