[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741486#comment-14741486
 ] 

Glenn Strycker commented on SPARK-10569:
----------------------------------------

Playing around with adding additional registrations, I tried adding things like 
"kryo.register( classOf[ (Any,Any,Any) ] )" and "kryo.register( classOf[ Array 
[ (Any,Any,Any) ] ] )", and I once got an error saying "User class threw 
exception: Task not serializable" instead of the "Class is not registered: 
scala.Tuple3[]" error.

It's still in the same spot of the code, though -- sortByKey

> Kryo serialization fails on sortByKey operation on registered RDDs
> ------------------------------------------------------------------
>
>                 Key: SPARK-10569
>                 URL: https://issues.apache.org/jira/browse/SPARK-10569
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, <server name>): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
>     kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
>     kryo.register(classOf[((Any,Any),(Any,Any))])
>     kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to