holdenk created SPARK-12151: ------------------------------- Summary: Improve PySpark MLLib prediction performance when using pickled vectors Key: SPARK-12151 URL: https://issues.apache.org/jira/browse/SPARK-12151 Project: Spark Issue Type: Improvement Components: MLlib, PySpark Reporter: holdenk Priority: Minor
In a number of places inside of PySpark MLLib when calling predict on an RDD we map the Python prediction function over the RDD, instead we could convert the RDD to an RDD of pickled Vectors and then use the Java prediction function. This would be useful for models which have optimized predicting on batches of objects (e.g. by broadcasting the relevant parts of the model or similar). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org