[ 
https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492928#comment-14492928
 ] 

Joseph K. Bradley commented on SPARK-3727:
------------------------------------------

[~maxkaznady] [~mqk] I split this into some subtasks, and we can add others 
later (for boosted trees, for regression, etc.).  It will be great if you can 
follow the spark.ml tree API JIRA (linked above) and take a look at it once 
it's posted.  That (and the ProbabilisticClassifier class) will give you an 
idea of what's entailed in adding these under the Pipelines API.

Do you have preferences on how to split up these tasks?  If you can figure that 
out, I'll be happy to assign them.  Thanks!

> Trees and ensembles: More prediction functionality
> --------------------------------------------------
>
>                 Key: SPARK-3727
>                 URL: https://issues.apache.org/jira/browse/SPARK-3727
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>
> DecisionTree and RandomForest currently predict the most likely label for 
> classification and the mean for regression.  Other info about predictions 
> would be useful.
> For classification: estimated probability of each possible label
> For regression: variance of estimate
> RandomForest could also create aggregate predictions in multiple ways:
> * Predict mean or median value for regression.
> * Compute variance of estimates (across all trees) for both classification 
> and regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to