[
https://issues.apache.org/jira/browse/SPARK-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph K. Bradley closed SPARK-9084.
------------------------------------
Resolution: Later
> Add in support for realtime data predictions using ML PipelineModel
> -------------------------------------------------------------------
>
> Key: SPARK-9084
> URL: https://issues.apache.org/jira/browse/SPARK-9084
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Reporter: Hollin Wilkins
> Priority: Minor
>
> Currently ML provides excellent support for feature manipulation, model
> selection, and prediction for large datasets. The models can all be easily
> serialized but currently it is not possible to use the fitted models without
> a DataFrame. This means that these models are only good for batch processing.
> In order to support realtime ML pipelines, I propose adding in three new
> methods to the Transformer class:
> def transform(row: StructuredRow): StructuredRow
> def transform(row: StructuredRow, paramMap: ParamMap): StructuredRow
> def transform(row: StructuredRow, firstParamPair: ParamPair[_],
> otherParamPairs: ParamPair[_]*): StructuredRow
> Where a StructuredRow is a case class that is the combination of an
> org.apache.spark.sql.Row and an org.apache.spark.sql.types.StructType. An
> alternative would be to modify the transform method signature to take in two
> objects, a StructType and a Row.
> This change necessitates the addition of the new transform method to each
> implementor of the Transformer class.
> Following this change, it would be trivial to include the spark jars in an
> API server, deserialize an ML PipelineModel object, take incoming data from
> users, convert it into a StructuredRow and feed it into the PipelineModel to
> get a realtime result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]