Aaron Defazio created SPARK-6520:
------------------------------------

             Summary: Kyro serialization broken in the shell
                 Key: SPARK-6520
                 URL: https://issues.apache.org/jira/browse/SPARK-6520
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.3.0
            Reporter: Aaron Defazio


If I start spark as follows:
{quote}
~/spark-1.3.0-bin-hadoop2.4/bin/spark-shell --master local[1] --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer"
{quote}

Then using :paste, run 
{quote}
    case class Example(foo : String, bar : String)
    val ex = sc.parallelize(List(Example("foo1", "bar1"), Example("foo2", 
"bar2"))).collect()
{quote}

I get the error:
{quote}
$VAL10 ($iwC)
$outer ($iwC$$iwC)
$outer ($iwC$$iwC$Example)
  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)
  at 
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:979)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1873)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
  at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
  at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
{quote}

As far as I can tell, when using :paste, Kyro serialization doesn't work for 
classes defined in within the same paste. It does work when the statements are 
entered without paste.

This issue seems serious to me, since Kyro serialization is virtually mandatory 
for performance (20x slower with default serialization on my problem), and I'm 
assuming feature parity between spark-shell and spark-submit is a goal.
Note that this is different from SPARK-6497, which covers the case when Kyro is 
set to require registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to