Akshat Aranya created SPARK-7708:
------------------------------------

             Summary: Incorrect task serialization with Kryo closure serializer
                 Key: SPARK-7708
                 URL: https://issues.apache.org/jira/browse/SPARK-7708
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.2.2
            Reporter: Akshat Aranya


I've been investigating the use of Kryo for closure serialization with Spark 
1.2, and it seems like I've hit upon a bug:

When a task is serialized before scheduling, the following log message is 
generated:

[info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, 
<host>, PROCESS_LOCAL, 302 bytes)

This message comes from TaskSetManager which serializes the task using the 
closure serializer.  Before the message is sent out, the TaskDescription (which 
included the original task as a byte array), is serialized again into a byte 
array with the closure serializer.  I added a log message for this in 
CoarseGrainedSchedulerBackend, which produces the following output:

[info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132

The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ 
than serialized task that it contains (302 bytes). This implies that 
TaskDescription.buffer is not getting serialized correctly.

On the executor side, the deserialization produces a null value for 
TaskDescription.buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to