[
https://issues.apache.org/jira/browse/SPARK-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207732#comment-14207732
]
Andrew Ash commented on SPARK-720:
----------------------------------
A big design goal of Spark is that you don't have to have any type restrictions
on the objects contained in an RDD. If the parametrized type of the RDD
happens to implement Comparable then the shuffler can make optimizations, but
it's not a requirement. It's also not a requirement that the type implements
Serializable, because you could be using Kryo to serialize and transport an
object out of the Spark user's control that doesn't have the Serializable
marker interface. I think it would be a hard sell to add restrictions to the
type parametrized type of an RDD.
Getting over that, I think the intention of guaranteeing serialization via the
type system could work for standard JVM serialization (Serializable and
Externalizable interfaces) because a class's serializability is clearly marked
with those interfaces already. But I'm concerned that it couldn't be made to
work with other serializer systems such as Kryo where there is no convenient
marker interface.
Kryo does serialization by registering a serializer for each class, and using
that Class->Serializer map for future serialization by reflectively looking at
an object's type as it receives objects for serialization. The only way to
know if a class is serializable is to know what classes a Kryo instance has
registered at compile time, which I believe is impossible given that the
Registrator comes from outside the Spark codebase.
[~emchristiansen] do you see a way to implement this generically for JVM
serialization + Kryo + other systems in the future? I think we may have to
close this request for infeasibility.
> Statically guarantee serialization will succeed
> -----------------------------------------------
>
> Key: SPARK-720
> URL: https://issues.apache.org/jira/browse/SPARK-720
> Project: Spark
> Issue Type: Improvement
> Affects Versions: 0.7.1
> Reporter: Eric Christiansen
>
> First, thanks for developing Spark. It's great.
> Maybe I'm trying to serialize weird objects (eg Shapeless constructs), but I
> tend to get quite a few NotSerializableExceptions. These are pretty annoying
> because they happen at runtime, lengthening my code/debug cycle.
> I'd like it if Spark could introduce a serialization system that could
> statically check that serialization will succeed. One approach is to use
> typeclasses, perhaps using Spray-Json as inspiration. An added benefit of
> typeclasses is they can be used to serialize objects that were not originally
> intended to be serialized.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]