[jira] [Commented] (SPARK-4882) PySpark broadcast breaks when using KryoSerializer

Fi (JIRA) Tue, 30 Dec 2014 17:07:48 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261756#comment-14261756
 ]


Fi commented on SPARK-4882:
---------------------------

Thanks for the quick turnaround on this fix!

I'll try it out when it gets merged in to the trunk (and reenable the Kryo 
serializer).

thanks,
FI


> PySpark broadcast breaks when using KryoSerializer
> --------------------------------------------------
>
>                 Key: SPARK-4882
>                 URL: https://issues.apache.org/jira/browse/SPARK-4882
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Fi
>            Assignee: Josh Rosen
>             Fix For: 1.3.0, 1.2.1
>
>
> When KryoSerializer is used, PySpark will throw NullPointerException when 
> trying to send broadcast variables to workers.  This issue does not occur 
> when the master is {{local}}, or when using the default JavaSerializer.
> *Reproduction*:
> Run
> {code}
> SPARK_LOCAL_IP=127.0.0.1 ./bin/pyspark --master local-cluster[2,2,512] --conf 
> spark.serializer=org.apache.spark.serializer.KryoSerializer
> {code}
> then run
> {code}
> b = sc.broadcast("hello")
> sc.parallelize([0]).flatMap(lambda x: b.value).collect()
> {code}
> This job fails because all tasks throw the following exception:
> {code}
> 14/12/28 14:26:08 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 8, 
> localhost): java.lang.NullPointerException
>       at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:589)
>       at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(PythonRDD.scala:232)
>       at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(PythonRDD.scala:228)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>       at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>       at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>       at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>       at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:228)
>       at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:203)
>       at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:203)
>       at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1515)
>       at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:202)
> {code}
> KryoSerializer may be enabled in the {{spark-defaults.conf}} file, so users 
> may hit this error and be confused.
> *Workaround*:
> Override the {{spark.serializer}} setting to use the default Java serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-4882) PySpark broadcast breaks when using KryoSerializer

Reply via email to