[
https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564266#comment-14564266
]
Josh Rosen commented on SPARK-7708:
-----------------------------------
Also, it looks like Chill is still using Kryo 2.2.1 instead of a newer version
because of some Storm incompatibilities or dependency problems or something:
https://github.com/twitter/chill/commit/3869b0122660c908e189ff08b615bd7221956224#commitcomment-8362755.
Therefore, it might be an uphill battle to do a version bump since it might
require community involvement from both the Kryo and/or Chill developers.
If the only blocker for Chill is Storm compatibility issues that don't affect
us, we might consider publishing our own fork of Chill under the
org.apache.spark namespace, similar to how we used to publish custom versions
of Pyrolite. If possible, though, I'd like to avoid that option and only use
it as a last resort.
I can't really spend much more time investigating this myself right now, but
would really appreciate it if someone would dig into these issues in more
detail and post a summary here.
> Incorrect task serialization with Kryo closure serializer
> ---------------------------------------------------------
>
> Key: SPARK-7708
> URL: https://issues.apache.org/jira/browse/SPARK-7708
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.2
> Reporter: Akshat Aranya
>
> I've been investigating the use of Kryo for closure serialization with Spark
> 1.2, and it seems like I've hit upon a bug:
> When a task is serialized before scheduling, the following log message is
> generated:
> [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342,
> <host>, PROCESS_LOCAL, 302 bytes)
> This message comes from TaskSetManager which serializes the task using the
> closure serializer. Before the message is sent out, the TaskDescription
> (which included the original task as a byte array), is serialized again into
> a byte array with the closure serializer. I added a log message for this in
> CoarseGrainedSchedulerBackend, which produces the following output:
> [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132
> The serialized size of TaskDescription (132 bytes) turns out to be _smaller_
> than serialized task that it contains (302 bytes). This implies that
> TaskDescription.buffer is not getting serialized correctly.
> On the executor side, the deserialization produces a null value for
> TaskDescription.buffer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]