[ 
https://issues.apache.org/jira/browse/SPARK-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526322#comment-14526322
 ] 

Bryan Cutler commented on SPARK-7127:
-------------------------------------

I have a couple questions [~josephkb] to make sure I'm on the right track..

#  In the ensembles transform() method, I broadcast the ensemble model like 
this {{val bcastModel = dataset.sqlContext.sparkContext.broadcast(tmpModel)}}, 
then make use of it to predict {{bcastModel.value.predict(features)}}.  Is that 
all that needs to be done as far as broadcasting goes?  Is the added benefit 
that the ensemble model will be cached while each node makes its predictions?
#  The common code used in both ensemble transform() methods isn't what I would 
normally think should belong in a "model" definition like 
{{TreeEnsembleModel}}.  Would it make sense to maybe put the transform logic in 
another Trait called {{TreeEnsembleTransform}}?

I hope that makes sense, thanks for your help!

> Broadcast spark.ml tree ensemble models for predict
> ---------------------------------------------------
>
>                 Key: SPARK-7127
>                 URL: https://issues.apache.org/jira/browse/SPARK-7127
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 1.4.0
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>              Labels: starter
>
> GBTRegressor/Classifier and RandomForestRegressor/Classifier should broadcast 
> models and then predict.  This will mean overriding transform().
> Note: Try to reduce duplicated code via the TreeEnsembleModel abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to