[
https://issues.apache.org/jira/browse/SPARK-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526322#comment-14526322
]
Bryan Cutler commented on SPARK-7127:
-------------------------------------
I have a couple questions [~josephkb] to make sure I'm on the right track..
# In the ensembles transform() method, I broadcast the ensemble model like
this {{val bcastModel = dataset.sqlContext.sparkContext.broadcast(tmpModel)}},
then make use of it to predict {{bcastModel.value.predict(features)}}. Is that
all that needs to be done as far as broadcasting goes? Is the added benefit
that the ensemble model will be cached while each node makes its predictions?
# The common code used in both ensemble transform() methods isn't what I would
normally think should belong in a "model" definition like
{{TreeEnsembleModel}}. Would it make sense to maybe put the transform logic in
another Trait called {{TreeEnsembleTransform}}?
I hope that makes sense, thanks for your help!
> Broadcast spark.ml tree ensemble models for predict
> ---------------------------------------------------
>
> Key: SPARK-7127
> URL: https://issues.apache.org/jira/browse/SPARK-7127
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 1.4.0
> Reporter: Joseph K. Bradley
> Priority: Minor
> Labels: starter
>
> GBTRegressor/Classifier and RandomForestRegressor/Classifier should broadcast
> models and then predict. This will mean overriding transform().
> Note: Try to reduce duplicated code via the TreeEnsembleModel abstraction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]