holdenk created SPARK-12151:
-------------------------------
Summary: Improve PySpark MLLib prediction performance when using
pickled vectors
Key: SPARK-12151
URL: https://issues.apache.org/jira/browse/SPARK-12151
Project: Spark
Issue Type: Improvement
Components: MLlib, PySpark
Reporter: holdenk
Priority: Minor
In a number of places inside of PySpark MLLib when calling predict on an RDD we
map the Python prediction function over the RDD, instead we could convert the
RDD to an RDD of pickled Vectors and then use the Java prediction function.
This would be useful for models which have optimized predicting on batches of
objects (e.g. by broadcasting the relevant parts of the model or similar).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]