[
https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554797#comment-14554797
]
Akshat Aranya edited comment on SPARK-7708 at 5/21/15 6:22 PM:
---------------------------------------------------------------
I was able to get this working with a couple of fixes:
1. Implementing serialization methods for Kryo in SerializableBuffer. An
alternative is to register SerializableBuffer with JavaSerialization in Kryo,
but that defeats the purpose.
2. The second part is a bit hokey because tasks within one executor process are
deserialized from a shared broadcast variable. Kryo deserialization modifies
the input buffer, so it isn't thread-safe
(https://code.google.com/p/kryo/issues/detail?id=128). I worked around this by
copying the broadcast buffer to a local buffer before deserializing.
This fixes are for 1.2, so I'll see if I can port them to master and write a
test for them.
was (Author: aaranya):
I was able to get this working with a couple of fixes:
1. Implementing serialization methods for Kryo in SerializableBuffer. An
alternative is to register SerializableBuffer with JavaSerialization in Kryo,
but that defeats the purpose.
2. The second part is a bit hokey because tasks within one executor process are
deserialized from a shared broadcast variable. Kryo deserialization modifies
the input buffer, so it isn't thread-safe
(https://code.google.com/p/kryo/issues/detail?id=128). I worked around this by
copying the broadcast buffer to a local buffer before deserializing.
> Incorrect task serialization with Kryo closure serializer
> ---------------------------------------------------------
>
> Key: SPARK-7708
> URL: https://issues.apache.org/jira/browse/SPARK-7708
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.2
> Reporter: Akshat Aranya
>
> I've been investigating the use of Kryo for closure serialization with Spark
> 1.2, and it seems like I've hit upon a bug:
> When a task is serialized before scheduling, the following log message is
> generated:
> [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342,
> <host>, PROCESS_LOCAL, 302 bytes)
> This message comes from TaskSetManager which serializes the task using the
> closure serializer. Before the message is sent out, the TaskDescription
> (which included the original task as a byte array), is serialized again into
> a byte array with the closure serializer. I added a log message for this in
> CoarseGrainedSchedulerBackend, which produces the following output:
> [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132
> The serialized size of TaskDescription (132 bytes) turns out to be _smaller_
> than serialized task that it contains (302 bytes). This implies that
> TaskDescription.buffer is not getting serialized correctly.
> On the executor side, the deserialization produces a null value for
> TaskDescription.buffer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]