Davies Liu created SPARK-4531:
---------------------------------

             Summary: Cache serialized java objects instead of serialized 
python objects in MLlib
                 Key: SPARK-4531
                 URL: https://issues.apache.org/jira/browse/SPARK-4531
             Project: Spark
          Issue Type: Improvement
          Components: MLlib, PySpark
    Affects Versions: 1.2.0
            Reporter: Davies Liu
            Priority: Blocker


The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it 
cause much performance regression in 1.2, because we cache the serialized 
Python object in JVM, deserialize them into Java object in each step.

We should change to cache the deserialized JavaRDD instead of PythonRDD to 
avoid the deserialization of Pyrolite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to