Dylan Bethune-Waddell created TINKERPOP-1341:
------------------------------------------------
Summary: UnshadedKryoAdapter fails to deserialize StarGraph when
SparkConf sets spark.rdd.compress=true whereas GryoSerializer works
Key: TINKERPOP-1341
URL: https://issues.apache.org/jira/browse/TINKERPOP-1341
Project: TinkerPop
Issue Type: Bug
Components: io
Affects Versions: 3.2.1, 3.3.0
Reporter: Dylan Bethune-Waddell
Priority: Minor
When trying to bulk load a large dataset into Titan I was running into OOM
errors and decided to try tweaking some spark configuration settings - although
I am having trouble bulk loading with the new GryoRegistrator/UnshadedKryo
serialization shim stuff in master whereby a few hundred tasks into the edge
loading stage (stage 5) exceptions are thrown complaining about the need to
explicitly register CompactBuffer[].class with Kryo, this approach with
spark.rdd.compress=true fails a few hundred tasks into the vertex loading stage
(stage 1) of BulkLoaderVertexProgram. GryoSerializer instead of KryoSerializer
with GryoRegistrator does not fail and successfully loads the data with this
compression flag flipped on whereas before I would just get OOM errors until
eventually the job was set back so far that it just failed. So it would seem it
is desirable in some instances to use this setting, and the new Serialization
stuff seems to break it. Could be a Spark upstream issue based on this open
JIRA ticket (https://issues.apache.org/jira/browse/SPARK-3630). Here is the
exception that is thrown with the middle bits cut out:
com.esotericsoftware.kryo.KryoException: java.io.IOException: PARSING_ERROR(2)
at com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
at com.esotericsoftware.kryo.io.Input.require(Input.java:169)
at com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:715)
at com.esotericsoftware.kryo.io.Input.readLong(Input.java:665)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readClassAndObject(UnshadedKryoAdapter.java:48)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readClassAndObject(UnshadedKryoAdapter.java:30)
at
org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.readEdges(StarGraphSerializer.java:134)
at
org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.read(StarGraphSerializer.java:91)
at
org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.read(StarGraphSerializer.java:45)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedSerializerAdapter.read(UnshadedSerializerAdapter.java:55)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readObject(UnshadedKryoAdapter.java:42)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readObject(UnshadedKryoAdapter.java:30)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.VertexWritableSerializer.read(VertexWritableSerializer.java:46)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.VertexWritableSerializer.read(VertexWritableSerializer.java:36)
at
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedSerializerAdapter.read(UnshadedSerializerAdapter.java:55)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)
........................................................ and so on
.....................................
Caused by: java.io.IOException: PARSING_ERROR(2)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
at
org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
at
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:167)
at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:150)
at com.esotericsoftware.kryo.io.Input.fill(Input.java:140)
... 51 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)