[jira] [Commented] (MRQL-98) Improve Data Serialization in Spark Evaluation

ASF GitHub Bot (JIRA) Wed, 19 Oct 2016 17:51:06 -0700

    [ 
https://issues.apache.org/jira/browse/MRQL-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590268#comment-15590268
 ]


ASF GitHub Bot commented on MRQL-98:
------------------------------------

GitHub user fegaras opened a pull request:

    https://github.com/apache/incubator-mrql/pull/28

    [MRQL-98] Improve Data Serialization in Spark Evaluation

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fegaras/incubator-mrql MRQL-98

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-mrql/pull/28.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #28
    
----
commit ab6bac73711b0d5fa5f05fa2c9b4c558c0576042
Author: Leonidas Fegaras <fega...@cse.uta.edu>
Date:   2016-10-19T23:45:07Z

    [MRQL-98] Improve Data Serialization in Spark Evaluation

----


> Improve Data Serialization in Spark Evaluation
> ----------------------------------------------
>
>                 Key: MRQL-98
>                 URL: https://issues.apache.org/jira/browse/MRQL-98
>             Project: MRQL
>          Issue Type: Improvement
>          Components: Run-Time/Spark
>    Affects Versions: 0.9.8
>            Reporter: Leonidas Fegaras
>            Assignee: Leonidas Fegaras
>            Priority: Critical
>
> MRQL data (MRData) are serialized as Writable (for Hadoop Map-Reduce), Java 
> Serializable (for Spark), and CopyableValue (for Flink). Until now, the Spark 
> MRQL engine was using a wrapper for MRData (called MRContainer) to serialize 
> data using the Writable methods. Some data used in Spark mode though were 
> left unwrapped, so Spark was using the default Java serialization, which was 
> inefficient. With this patch, MRData becomes Serializable with custom 
> serialization methods that are very efficient. My performance evaluation of 
> the Pagerank query over 10 millions links run on a cluster with 16 cores 
> gives 38% improvement compared to the old Spark evaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MRQL-98) Improve Data Serialization in Spark Evaluation

Reply via email to