[ https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741582#comment-14741582 ]
Glenn Strycker commented on SPARK-10569: ---------------------------------------- Is this issue related to HIVE-7540 or SPARK-2421? > Kryo serialization fails on sortByKey operation on registered RDDs > ------------------------------------------------------------------ > > Key: SPARK-10569 > URL: https://issues.apache.org/jira/browse/SPARK-10569 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Glenn Strycker > > I have code that creates RDDs, persists, checkpoints, and materializes (using > count()), and these RDDs are serialized with Kryo, using the standard code. > I have "kryo.setRegistrationRequired(true)", which is useful for debugging my > code to find out which RDDs I haven't registered. Unfortunately, having this > setting turned on does not seem compatible with Spark internals. > When my code encounters a sortByKey, it fails, giving my an error: > {noformat} > User class threw exception: Job aborted due to stage failure: Task 1 in stage > 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID > 232, <server name>): java.lang.IllegalArgumentException: Class is not > registered: scala.Tuple3[] > Note: To register this class use: kryo.register(scala.Tuple3[].class); > at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) > at > org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > {noformat} > Why is scala.Tuple3[] not registered? I attempted to register it using > various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem > to work. > I tried making sure that both my keys and values of my RDD are both > registered in addition to the entire RDD. I have lines like this: > {code} > kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))]) > kryo.register(classOf[((Any,Any),(Any,Any))]) > kryo.register(classOf[((Any, Any),Any)]) > {code} > Again, my program is only dying on the sortByKey command. If I get rid of > it, the code proceeds just fine, but I need this for certain operations > (assigning indices based on sort order). > FYI, it is failing of RDDs of all types... I verified this in several places > in my program. > {code} > myRDD.sortByKey(ascending=true).collect().foreach(println) > {code} > doesn't work (gives the error above), but > {code} > myRDD.collect().foreach(println) > {code} > works just fine. My code also works if I turn off > "kryo.setRegistrationRequired(true)". -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org