[
https://issues.apache.org/jira/browse/SPARK-12773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094361#comment-15094361
]
Rahul Tanwani commented on SPARK-12773:
---------------------------------------
[~srowen] Thanks. Before putting it here, I asked on the user mailing list, but
did not get any reply. If you have info on the same, here is the post
http://apache-spark-user-list.1001560.n3.nabble.com/Impurity-and-Samples-details-for-each-node-of-a-decision-tree-td25941.html.
> Impurity and Sample details for each node of a decision tree
> ------------------------------------------------------------
>
> Key: SPARK-12773
> URL: https://issues.apache.org/jira/browse/SPARK-12773
> Project: Spark
> Issue Type: Question
> Components: ML, MLlib
> Affects Versions: 1.5.2
> Reporter: Rahul Tanwani
>
> I just want to understand if each node in the decision tree calculates /
> stores information about no. of samples that satisfy the split criteria.
> Looking at the code, I find some information about the impurity statistics
> but did not find anything on the samples. Sci-kit learn exposes both of these
> metrics. The information may help in the cases where there are multiple
> decision rules (multiple leaf nodes) yielding the same prediction and we want
> to do some relative comparisions of decision paths.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]