[
https://issues.apache.org/jira/browse/SPARK-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiangrui Meng updated SPARK-6345:
---------------------------------
Target Version/s: 1.1.2, 1.2.2, 1.4.0, 1.3.1 (was: 1.3.1)
> Model update propagation during prediction in Streaming Regression
> ------------------------------------------------------------------
>
> Key: SPARK-6345
> URL: https://issues.apache.org/jira/browse/SPARK-6345
> Project: Spark
> Issue Type: Bug
> Components: MLlib, Streaming
> Reporter: Jeremy Freeman
> Assignee: Jeremy Freeman
>
> During streaming regression analyses (Streaming Linear Regression and
> Streaming Logistic Regression), model updates based on training data are not
> being reflected in subsequent calls to predictOn or predictOnValues, despite
> updates themselves occurring successfully. It may be due to recent changes to
> model declaration, and I have a working fix prepared to be submitted ASAP
> (alongside expanded test coverage).
> A temporary workaround is to retrieve and use the updated model within a
> foreachRDD, as in:
> {code}
> model.trainOn(trainingData)
> testingData.foreachRDD{ rdd =>
> val latest = model.latestModel()
> val predictions = rdd.map(lp => latest.predict(lp.features))
> ...print or other side effects...
> }
> {code}
> Or within a transform, as in:
> {code}
> model.trainOn(trainingData)
> val predictions = testingData.transform { rdd =>
> val latest = model.latestModel()
> rdd.map(lp => (lp.label, latest.predict(lp.features)))
> }
> {code}
> Note that this does not affect Streaming KMeans, which works as expected for
> combinations of training and prediction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]