Dylan Bethune-Waddell created TINKERPOP-1341:
------------------------------------------------

             Summary: UnshadedKryoAdapter fails to deserialize StarGraph when 
SparkConf sets spark.rdd.compress=true whereas GryoSerializer works
                 Key: TINKERPOP-1341
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1341
             Project: TinkerPop
          Issue Type: Bug
          Components: io
    Affects Versions: 3.2.1, 3.3.0
            Reporter: Dylan Bethune-Waddell
            Priority: Minor


When trying to bulk load a large dataset into Titan I was running into OOM 
errors and decided to try tweaking some spark configuration settings - although 
I am having trouble bulk loading with the new GryoRegistrator/UnshadedKryo 
serialization shim stuff in master whereby a few hundred tasks into the edge 
loading stage (stage 5) exceptions are thrown complaining about the need to 
explicitly register CompactBuffer[].class with Kryo, this approach with 
spark.rdd.compress=true fails a few hundred tasks into the vertex loading stage 
(stage 1) of BulkLoaderVertexProgram. GryoSerializer instead of KryoSerializer 
with GryoRegistrator does not fail and successfully loads the data with this 
compression flag flipped on whereas before I would just get OOM errors until 
eventually the job was set back so far that it just failed. So it would seem it 
is desirable in some instances to use this setting, and the new Serialization 
stuff seems to break it. Could be a Spark upstream issue based on this open 
JIRA ticket (https://issues.apache.org/jira/browse/SPARK-3630). Here is the 
exception that is thrown with the middle bits cut out:

com.esotericsoftware.kryo.KryoException: java.io.IOException: PARSING_ERROR(2)
        at com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
        at com.esotericsoftware.kryo.io.Input.require(Input.java:169)
        at com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:715)
        at com.esotericsoftware.kryo.io.Input.readLong(Input.java:665)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readClassAndObject(UnshadedKryoAdapter.java:48)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readClassAndObject(UnshadedKryoAdapter.java:30)
        at 
org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.readEdges(StarGraphSerializer.java:134)
        at 
org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.read(StarGraphSerializer.java:91)
        at 
org.apache.tinkerpop.gremlin.structure.util.star.StarGraphSerializer.read(StarGraphSerializer.java:45)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedSerializerAdapter.read(UnshadedSerializerAdapter.java:55)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readObject(UnshadedKryoAdapter.java:42)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedKryoAdapter.readObject(UnshadedKryoAdapter.java:30)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.VertexWritableSerializer.read(VertexWritableSerializer.java:46)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.VertexWritableSerializer.read(VertexWritableSerializer.java:36)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.gryo.kryoshim.unshaded.UnshadedSerializerAdapter.read(UnshadedSerializerAdapter.java:55)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
        at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)

........................................................ and so on 
.....................................

Caused by: java.io.IOException: PARSING_ERROR(2)
        at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
        at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
        at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
        at 
org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
        at 
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:167)
        at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:150)
        at com.esotericsoftware.kryo.io.Input.fill(Input.java:140)
        ... 51 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to