[ 
https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9523:
-----------------------------
    Target Version/s:   (was: 1.3.1)
            Priority: Minor  (was: Major)
       Fix Version/s:     (was: 1.4.2)
                          (was: 1.3.2)

[~fish748] Please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark  The 
fields on this JIRA can't be right... 1.3.1 was released. Fix version doesn't 
apply to unresolved JIRAs. etc.

> Receiver for Spark Streaming does not naturally support kryo serializer
> -----------------------------------------------------------------------
>
>                 Key: SPARK-9523
>                 URL: https://issues.apache.org/jira/browse/SPARK-9523
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.1
>         Environment: Windows 7 local mode
>            Reporter: John Chen
>            Priority: Minor
>              Labels: kryo, serialization
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> In some cases, some attributes in a class is not serializable, which you 
> still want to use after serialization of the whole object, you'll have to 
> customize your serialization codes. For example, you can declare those 
> attributes as transient, which makes them ignored during serialization, and 
> then you can reassign their values during deserialization.
> Now, if you're using Java serialization, you'll have to implement 
> Serializable, and write those codes in readObject() and writeObejct() 
> methods; And if you're using kryo serialization, you'll have to implement 
> KryoSerializable, and write these codes in read() and write() methods.
> In Spark and Spark Streaming, you can set kryo as the serializer for speeding 
> up. However, the functions taken by RDD or DStream operations are still 
> serialized by Java serialization, which means you only need to write those 
> custom serialization codes in readObject() and writeObejct() methods.
> But when it comes to Spark Streaming's Receiver, things are different. When 
> you wish to customize an InputDStream, you must extend the Receiver. However, 
> it turns out, the Receiver will be serialized by kryo if you set kryo 
> serializer in SparkConf, and will fall back to Java serialization if you 
> didn't.
> So here's comes the problems, if you want to change the serializer by 
> configuration and make sure the Receiver runs perfectly for both Java and 
> kryo, you'll have to write all the 4 methods above. First, it is redundant, 
> since you'll have to write serialization/deserialization code almost twice; 
> Secondly, there's nothing in the doc or in the code to inform users to 
> implement the KryoSerializable interface. 
> Since all other function parameters are serialized by Java only, I suggest 
> you also make it so for the Receiver. It may be slower, but since the 
> serialization will only be executed for each interval, it's durable. More 
> importantly, it can cause fewer trouble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to