[ 
https://issues.apache.org/jira/browse/SPARK-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092367#comment-14092367
 ] 

Graham Dennis commented on SPARK-2878:
--------------------------------------

Here's the problem as I see it: To use a custom kryo registrator, the 
application jar must be available to the executor JVM.  Currently, the 
application jar isn't added to the classpath on launch, and so needs to be 
added later.  This happens when a task is sent to the executor JVM.  But the 
only reason the executor JVM can deserialise the task is because the closure 
serialiser can be different to the normal object serialiser, and it defaults to 
the Java serialiser.  If you were to try and use the kryo serialiser to 
serialise the closure, you'd have a chicken-and-egg problem: to know what jars 
the task needs, you need to deserialise the task, but to deserialise the task 
you need the application jars that contain the custom kryo registrator.

A similar problem would occur if you tried to set a custom serialiser that only 
existed in the application jar.

So my question is this: is there a reason that the application jar isn't added 
to (the end of) the classpath of the executor JVMs at launch time?  This would 
allow the application jar to contain a custom serialiser and/or a custom kryo 
registrator.  Additional jars can still be added to the executors later, but 
the user can't intend for these to modify the behaviour of the kryo registrator 
(as that would almost certainly lead to inconsistencies).

> Inconsistent Kryo serialisation with custom Kryo Registrator
> ------------------------------------------------------------
>
>                 Key: SPARK-2878
>                 URL: https://issues.apache.org/jira/browse/SPARK-2878
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0, 1.0.2
>         Environment: Linux RedHat EL 6, 4-node Spark cluster.
>            Reporter: Graham Dennis
>
> The custom Kryo Registrator (a class with the 
> org.apache.spark.serializer.KryoRegistrator trait) is not used with every 
> Kryo instance created, and this causes inconsistent serialisation and 
> deserialisation.
> The Kryo Registrator is sometimes not used because of a ClassNotFound 
> exception that only occurs if it *isn't* the Worker thread (of an Executor) 
> that tries to create the KryoRegistrator.
> A complete description of the problem and a project reproducing the problem 
> can be found at https://github.com/GrahamDennis/spark-kryo-serialisation
> I have currently only tested this with Spark 1.0.0, but will try to test 
> against 1.0.2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to