[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

Prasanth (JIRA) Tue, 04 Apr 2017 13:28:59 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955747#comment-15955747
 ]


Prasanth commented on SPARK-18737:
----------------------------------

We are using 2.0.1. Can you tell us how to "disable the Kryo auto-pick for 
streaming from the Java API" as a workaround?

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -------------------------------------------------------------
>
>                 Key: SPARK-18737
>                 URL: https://issues.apache.org/jira/browse/SPARK-18737
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
>         at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>         at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
>         at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
>         at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
>         at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>         at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
>         at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>         at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>         at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>         at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>         at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
>         at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>         at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
>         at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>         at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
>         at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
>         at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>         at org.apache.spark.scheduler.Task.run(Task.scala:86)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be non-deterministic, but to 
> become more frequent over time.
> Several questions we could not answer during our troubleshooting:
> 1. How can the debug log for Kryo be enabled? -- We tried following the 
> minilog documentation, but no output can be found.
> 2. Is the serializer setting effective for Spark internal serializations? How 
> can the JavaSerialize be forced on internal serializations for worker to 
> driver communication?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

Reply via email to