[ 
https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100151#comment-16100151
 ] 

Saurabh Agrawal edited comment on SPARK-21476 at 7/26/17 6:56 AM:
------------------------------------------------------------------

[~peng.m...@intel.com] I am using it in spark streaming where I give 16 cores 
to the application. The dataset in each batch has around 100 partitions. The 
model has 120 trees and is trained with max depth 15. Number of features is 
around 100. 


was (Author: sagraw):
I am using it in spark streaming where I give 16 cores to the application. The 
dataset in each batch has around 100 partitions. The model has 120 trees and is 
trained with max depth 15. Number of features is around 100. 

> RandomForest classification model not using broadcast in transform
> ------------------------------------------------------------------
>
>                 Key: SPARK-21476
>                 URL: https://issues.apache.org/jira/browse/SPARK-21476
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Saurabh Agrawal
>
> I notice significant task deserialization latency while running prediction 
> with pipelines using RandomForestClassificationModel. While digging into the 
> source, found that the transform method in RandomForestClassificationModel 
> binds to its parent ProbabilisticClassificationModel and the only concrete 
> definition that RandomForestClassificationModel provides and which is 
> actually used in transform is that of predictRaw. Broadcasting is not being 
> used in predictRaw.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to