Xiangrui Meng created SPARK-2293:
------------------------------------

             Summary: Replace RDD.zip usage by map with predict inside.
                 Key: SPARK-2293
                 URL: https://issues.apache.org/jira/browse/SPARK-2293
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Xiangrui Meng
            Priority: Minor


In our guide, we use

{code}
val prediction = model.predict(test.map(_.features))
val predictionAndLabel = prediction.zip(test.map(_.label))
{code}

This is not efficient because test will be computed twice. We should change it 
to

{code}
val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
{code}

It is also nice to add a `predictWith` method to predictive models.

{code}
def predictWith[V](RDD[(Vector, V)]): RDD[(Double, V)]
{code}

But I'm not sure whether this is a good name. `predictWithValue`?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to