[ 
https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492906#comment-14492906
 ] 

Max Kaznady commented on SPARK-3727:
------------------------------------

Yes, probabilities have to be added to other models too, like 
LogisticRegression. Right now they are hardcoded in two places but not 
outputted in PySpark.

I think is makes sense to split into PySpark, then classification, then 
probabilities, and then group different types of algorithms, all of which 
output probabilities: Logistic Regression, Random Forest, etc.

Can also add probabilities for trees by counting the number of leaf 1's and 0's.

What do you think?

> DecisionTree, RandomForest: More prediction functionality
> ---------------------------------------------------------
>
>                 Key: SPARK-3727
>                 URL: https://issues.apache.org/jira/browse/SPARK-3727
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>
> DecisionTree and RandomForest currently predict the most likely label for 
> classification and the mean for regression.  Other info about predictions 
> would be useful.
> For classification: estimated probability of each possible label
> For regression: variance of estimate
> RandomForest could also create aggregate predictions in multiple ways:
> * Predict mean or median value for regression.
> * Compute variance of estimates (across all trees) for both classification 
> and regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to