[ https://issues.apache.org/jira/browse/MRQL-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590268#comment-15590268 ]
ASF GitHub Bot commented on MRQL-98: ------------------------------------ GitHub user fegaras opened a pull request: https://github.com/apache/incubator-mrql/pull/28 [MRQL-98] Improve Data Serialization in Spark Evaluation You can merge this pull request into a Git repository by running: $ git pull https://github.com/fegaras/incubator-mrql MRQL-98 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-mrql/pull/28.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #28 ---- commit ab6bac73711b0d5fa5f05fa2c9b4c558c0576042 Author: Leonidas Fegaras <fega...@cse.uta.edu> Date: 2016-10-19T23:45:07Z [MRQL-98] Improve Data Serialization in Spark Evaluation ---- > Improve Data Serialization in Spark Evaluation > ---------------------------------------------- > > Key: MRQL-98 > URL: https://issues.apache.org/jira/browse/MRQL-98 > Project: MRQL > Issue Type: Improvement > Components: Run-Time/Spark > Affects Versions: 0.9.8 > Reporter: Leonidas Fegaras > Assignee: Leonidas Fegaras > Priority: Critical > > MRQL data (MRData) are serialized as Writable (for Hadoop Map-Reduce), Java > Serializable (for Spark), and CopyableValue (for Flink). Until now, the Spark > MRQL engine was using a wrapper for MRData (called MRContainer) to serialize > data using the Writable methods. Some data used in Spark mode though were > left unwrapped, so Spark was using the default Java serialization, which was > inefficient. With this patch, MRData becomes Serializable with custom > serialization methods that are very efficient. My performance evaluation of > the Pagerank query over 10 millions links run on a cluster with 16 cores > gives 38% improvement compared to the old Spark evaluation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)