Leonidas Fegaras created MRQL-98:
------------------------------------

             Summary: Improve Data Serialization in Spark Evaluation
                 Key: MRQL-98
                 URL: https://issues.apache.org/jira/browse/MRQL-98
             Project: MRQL
          Issue Type: Improvement
          Components: Run-Time/Spark
    Affects Versions: 0.9.8
            Reporter: Leonidas Fegaras
            Assignee: Leonidas Fegaras
            Priority: Critical


MRQL data (MRData) are serialized as Writable (for Hadoop Map-Reduce), Java 
Serializable (for Spark), and CopyableValue (for Flink). Until now, the Spark 
MRQL engine was using a wrapper for MRData (called MRContainer) to serialize 
data using the Writable methods. Some data used in Spark mode though were left 
unwrapped, so Spark was using the default Java serialization, which was 
inefficient. With this patch, MRData becomes Serializable with custom 
serialization methods that are very efficient. My performance evaluation of the 
Pagerank query over 10 millions links run on a cluster with 16 cores gives 38% 
improvement compared to the old Spark evaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to