Davies Liu created SPARK-4531:
---------------------------------
Summary: Cache serialized java objects instead of serialized
python objects in MLlib
Key: SPARK-4531
URL: https://issues.apache.org/jira/browse/SPARK-4531
Project: Spark
Issue Type: Improvement
Components: MLlib, PySpark
Affects Versions: 1.2.0
Reporter: Davies Liu
Priority: Blocker
The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it
cause much performance regression in 1.2, because we cache the serialized
Python object in JVM, deserialize them into Java object in each step.
We should change to cache the deserialized JavaRDD instead of PythonRDD to
avoid the deserialization of Pyrolite.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]