Joseph K. Bradley created SPARK-7412:
----------------------------------------
Summary: Designing distributed prediction model abstractions for
spark.ml
Key: SPARK-7412
URL: https://issues.apache.org/jira/browse/SPARK-7412
Project: Spark
Issue Type: Brainstorming
Components: ML
Reporter: Joseph K. Bradley
The Pipelines API (spark.ml package) now includes abstractions for single-label
prediction: Predictor, Classifier, Regressor. These assume models are local,
where single-Row prediction methods can be used as UDFs. We need to think
about how to support distributed models in these abstractions.
Should the abstractions be modified somehow? Or should there be parallel (or
inheriting) abstractions, or a mix-in?
Motivation: We may start supporting distributed models since linear models,
random forests, and other models can get large enough to merit distributed
storage and computation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]