Xiangrui Meng created SPARK-2293:
------------------------------------
Summary: Replace RDD.zip usage by map with predict inside.
Key: SPARK-2293
URL: https://issues.apache.org/jira/browse/SPARK-2293
Project: Spark
Issue Type: Improvement
Components: MLlib
Reporter: Xiangrui Meng
Priority: Minor
In our guide, we use
{code}
val prediction = model.predict(test.map(_.features))
val predictionAndLabel = prediction.zip(test.map(_.label))
{code}
This is not efficient because test will be computed twice. We should change it
to
{code}
val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
{code}
It is also nice to add a `predictWith` method to predictive models.
{code}
def predictWith[V](RDD[(Vector, V)]): RDD[(Double, V)]
{code}
But I'm not sure whether this is a good name. `predictWithValue`?
--
This message was sent by Atlassian JIRA
(v6.2#6252)