[ 
https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511133#comment-14511133
 ] 

Oscar Olmedo commented on SPARK-3727:
-------------------------------------

Hello,

Here is a [link to my fork | 
https://github.com/apache/spark/compare/master...oscaroboto:master] of proposed 
changes to return probability within mllib for DecisionTreeModel and 
RandomForestModel. But, as [~josephkb] pointed out we should look into spark.ml 
 ProbabiliticClassifier class. I'm also not familiar with spark.ml so I will 
take some time familiarize myself with it.  

Thanks for all the efforts into getting probabilities.

Oscar

> Trees and ensembles: More prediction functionality
> --------------------------------------------------
>
>                 Key: SPARK-3727
>                 URL: https://issues.apache.org/jira/browse/SPARK-3727
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>
> DecisionTree and RandomForest currently predict the most likely label for 
> classification and the mean for regression.  Other info about predictions 
> would be useful.
> For classification: estimated probability of each possible label
> For regression: variance of estimate
> RandomForest could also create aggregate predictions in multiple ways:
> * Predict mean or median value for regression.
> * Compute variance of estimates (across all trees) for both classification 
> and regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to