holdenk created SPARK-12151:
-------------------------------

             Summary: Improve PySpark MLLib prediction performance when using 
pickled vectors
                 Key: SPARK-12151
                 URL: https://issues.apache.org/jira/browse/SPARK-12151
             Project: Spark
          Issue Type: Improvement
          Components: MLlib, PySpark
            Reporter: holdenk
            Priority: Minor


In a number of places inside of PySpark MLLib when calling predict on an RDD we 
map the Python prediction function over the RDD, instead we could convert the 
RDD to an RDD of pickled Vectors and then use the Java prediction function. 
This would be useful for models which have optimized predicting on batches of 
objects (e.g. by broadcasting the relevant parts of the model or similar).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to