[
https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347477#comment-14347477
]
Joseph K. Bradley commented on SPARK-5981:
------------------------------------------
{quote}
I understand "store the model in Python and do prediction on an RDD using map +
single-instance predict within the map", which is quite intuitive. I think this
is being done all over in other places also right?
{quote}
Yes, it's done for GLMs.
{quote}
Could you explain what do you mean by "storing the model in Scala and do batch
prediction on an RDD"? What advantage does storing the model in scala and again
calling it for every feature within the map bring?
{quote}
You can examine DecisionTreeModel to see how it's done. It's a complicated
model, and writing the code to store it in a Python class will be a bit of work
(but will be done at some point). Using the Scala implementation is the easy
solution. With the model in Scala, there is also only one call to the JVM;
it's this part for which I'm unsure about how it will affect performance when
the model is in Python. I suspect it will be fine, but we'll have to see.
> pyspark ML models should support predict/transform on vector within map
> -----------------------------------------------------------------------
>
> Key: SPARK-5981
> URL: https://issues.apache.org/jira/browse/SPARK-5981
> Project: Spark
> Issue Type: Improvement
> Components: MLlib, PySpark
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
>
> Currently, most Python models only have limited support for single-vector
> prediction.
> E.g., one can call {code}model.predict(myFeatureVector){code} for a single
> instance, but that fails within a map for Python ML models and transformers
> which use JavaModelWrapper:
> {code}
> data.map(lambda features: model.predict(features))
> {code}
> This fails because JavaModelWrapper.call uses the SparkContext (within the
> transformation). (It works for linear models, which do prediction within
> Python.)
> Supporting prediction within a map would require storing the model and doing
> prediction/transformation within Python.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]