[ 
https://issues.apache.org/jira/browse/SPARK-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-6986:
----------------------------
    Description: Our existing Java and Kryo serializer are both general-purpose 
serialize. They treat every object individually and encode the type of an 
object to underlying stream. For Spark, it is common that we serialize a 
collection with records having the same types (for example, records of a 
DataFrame). For these cases, we do not need to write out types of records and 
we can take advantage the type information to build specialized serializer. To 
do so, seems we need to extend the interface of 
SerializationStream/DeserializationStream, so a 
SerializationStream/DeserializationStream can have more information about 
objects passed in (for example, if an object is key/value pair, a key, or a 
value).   (was: Our existing Java and Kryo serializer are both general-purpose 
serialize. They treat every object individually and encode the type of an 
object to underlying stream. For Spark, it is common that we serialize a 
collection with records having the same types (for example, records of a 
DataFrame). For these cases, we do not need to write out types of records and 
we can take advantage the type information to build specialized serializer. To 
do so, seems we need to extend the interface of 
SerializationStream/DeserializationStream, so a 
SerializationStream/DeserializationStream can have more information about 
objects passed in (for example, if a object is in the form of key/value pair, a 
key, or a value). )

> Make SerializationStream/DeserializationStream understand key/value semantic
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-6986
>                 URL: https://issues.apache.org/jira/browse/SPARK-6986
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>            Reporter: Yin Huai
>            Priority: Blocker
>
> Our existing Java and Kryo serializer are both general-purpose serialize. 
> They treat every object individually and encode the type of an object to 
> underlying stream. For Spark, it is common that we serialize a collection 
> with records having the same types (for example, records of a DataFrame). For 
> these cases, we do not need to write out types of records and we can take 
> advantage the type information to build specialized serializer. To do so, 
> seems we need to extend the interface of 
> SerializationStream/DeserializationStream, so a 
> SerializationStream/DeserializationStream can have more information about 
> objects passed in (for example, if an object is key/value pair, a key, or a 
> value). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to