[ 
https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554819#comment-14554819
 ] 

Josh Rosen commented on SPARK-7708:
-----------------------------------

I can investigate this in more detail later, but is registering 
SerializableBuffer with JavaSerialization really that expensive?  I imagine 
that the most expensive part of the closure serialization is serializing the 
closure itself.  Once we already have the serialized closure's bytes, I don't 
think that including them in the TaskDescription should be very expensive 
because we're only dealing with one or two fields vs. an entire object graph.  
On the other hand, if it's not much work to implement Kryo serialization 
methods for that class then maybe that's fine, too.

Good find with that Kryo deserialization thread-safety issue; I'd be interested 
in seeing your fix for this.

Feel free to submit a PR for this; I'd be happy to help review.

> Incorrect task serialization with Kryo closure serializer
> ---------------------------------------------------------
>
>                 Key: SPARK-7708
>                 URL: https://issues.apache.org/jira/browse/SPARK-7708
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.2
>            Reporter: Akshat Aranya
>
> I've been investigating the use of Kryo for closure serialization with Spark 
> 1.2, and it seems like I've hit upon a bug:
> When a task is serialized before scheduling, the following log message is 
> generated:
> [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, 
> <host>, PROCESS_LOCAL, 302 bytes)
> This message comes from TaskSetManager which serializes the task using the 
> closure serializer.  Before the message is sent out, the TaskDescription 
> (which included the original task as a byte array), is serialized again into 
> a byte array with the closure serializer.  I added a log message for this in 
> CoarseGrainedSchedulerBackend, which produces the following output:
> [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132
> The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ 
> than serialized task that it contains (302 bytes). This implies that 
> TaskDescription.buffer is not getting serialized correctly.
> On the executor side, the deserialization produces a null value for 
> TaskDescription.buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to