[
https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-9523.
------------------------------
Resolution: Not A Problem
I think you're just talking about the difference between implementing Kryo and
Java serialization. This isn't specific to Spark, or a problem. Are you talking
about the difference between the closure serializer and the data serializer?
both can use kryo AFAIK but are you setting both to use Kryo if desired?
> Receiver for Spark Streaming does not naturally support kryo serializer
> -----------------------------------------------------------------------
>
> Key: SPARK-9523
> URL: https://issues.apache.org/jira/browse/SPARK-9523
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.3.1
> Environment: Windows 7 local mode
> Reporter: Yuhang Chen
> Priority: Minor
> Labels: kryo, serialization
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> In some cases, some attributes in a class is not serializable, which you
> still want to use after serialization of the whole object, you'll have to
> customize your serialization codes. For example, you can declare those
> attributes as transient, which makes them ignored during serialization, and
> then you can reassign their values during deserialization.
> Now, if you're using Java serialization, you'll have to implement
> Serializable, and write those codes in readObject() and writeObejct()
> methods; And if you're using kryo serialization, you'll have to implement
> KryoSerializable, and write these codes in read() and write() methods.
> In Spark and Spark Streaming, you can set kryo as the serializer for speeding
> up. However, the functions taken by RDD or DStream operations are still
> serialized by Java serialization, which means you only need to write those
> custom serialization codes in readObject() and writeObejct() methods.
> But when it comes to Spark Streaming's Receiver, things are different. When
> you wish to customize an InputDStream, you must extend the Receiver. However,
> it turns out, the Receiver will be serialized by kryo if you set kryo
> serializer in SparkConf, and will fall back to Java serialization if you
> didn't.
> So here's comes the problems, if you want to change the serializer by
> configuration and make sure the Receiver runs perfectly for both Java and
> kryo, you'll have to write all the 4 methods above. First, it is redundant,
> since you'll have to write serialization/deserialization code almost twice;
> Secondly, there's nothing in the doc or in the code to inform users to
> implement the KryoSerializable interface.
> Since all other function parameters are serialized by Java only, I suggest
> you also make it so for the Receiver. It may be slower, but since the
> serialization will only be executed for each interval, it's durable. More
> importantly, it can cause fewer trouble
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]