[ 
https://issues.apache.org/jira/browse/SPARK-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525070#comment-14525070
 ] 

Apache Spark commented on SPARK-6986:
-------------------------------------

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/5849

> Makes SparkSqlSerializer2 support sort-based shuffle with sort merge
> --------------------------------------------------------------------
>
>                 Key: SPARK-6986
>                 URL: https://issues.apache.org/jira/browse/SPARK-6986
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> *Update*: SPARK-4550 has exposed the interfaces. We can safely enable 
> Serializer2 to support sort merge.
> *Original description*:
> Our existing Java and Kryo serializer are both general-purpose serialize. 
> They treat every object individually and encode the type of an object to 
> underlying stream. For Spark, it is common that we serialize a collection 
> with records having the same types (for example, records of a DataFrame). For 
> these cases, we do not need to write out types of records and we can take 
> advantage the type information to build specialized serializer. To do so, 
> seems we need to extend the interface of 
> SerializationStream/DeserializationStream, so a 
> SerializationStream/DeserializationStream can have more information about 
> objects passed in (for example, if an object is key/value pair, a key, or a 
> value).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to