[ 
https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136515#comment-16136515
 ] 

Saurabh Agrawal commented on SPARK-21476:
-----------------------------------------

[[email protected]] I believe the effect on prediction time would be the 
same for RandomForestClassificationModel. In any case, it can be argued that 
using broadcasting causes no harm and results in better performance in terms of 
prediction time, the performance gain increasing with increasing model size. 

> RandomForest classification model not using broadcast in transform
> ------------------------------------------------------------------
>
>                 Key: SPARK-21476
>                 URL: https://issues.apache.org/jira/browse/SPARK-21476
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Saurabh Agrawal
>
> I notice significant task deserialization latency while running prediction 
> with pipelines using RandomForestClassificationModel. While digging into the 
> source, found that the transform method in RandomForestClassificationModel 
> binds to its parent ProbabilisticClassificationModel and the only concrete 
> definition that RandomForestClassificationModel provides and which is 
> actually used in transform is that of predictRaw. Broadcasting is not being 
> used in predictRaw.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to