Rahul Tanwani created SPARK-12773:
-------------------------------------
Summary: Impurity and Sample details for each node of a decision
tree
Key: SPARK-12773
URL: https://issues.apache.org/jira/browse/SPARK-12773
Project: Spark
Issue Type: Question
Components: ML, MLlib
Affects Versions: 1.5.2
Reporter: Rahul Tanwani
I just want to understand if each node in the decision tree calculates / stores
information about no. of samples that satisfy the split criteria. Looking at
the code, I find some information about the impurity statistics but did not
find anything on the samples. Sci-kit learn exposes both of these metrics. The
information may help in the cases where there are multiple decision rules
(multiple leaf nodes) yielding the same prediction and we want to do some
relative comparisions of decision paths.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]