[
https://issues.apache.org/jira/browse/MADLIB-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781659#comment-15781659
]
Rahul Iyer commented on MADLIB-1051:
------------------------------------
So it looks like the sklearn function provides following information:
- If internal node, feature and threshold that splits the tuples
- Impurity of the node
- 'samples': Number of tuples landing on that node
- 'value':
-- If regression: Prediction if the node was treated as a leaf (float)
-- If classification: The counts of the output labels among the samples in
that node
- If classification, actual prediction from that node
I suggest we output the same information, since we compute all of it in our
tree functions. For regression, I suggest adding the standard deviation along
with the mean value e.g. {{value = 0.5712 (+- 0.3)}} and have a human-readable
format for the floats (in case the values get too big).
> Display split values in DT visualization
> ----------------------------------------
>
> Key: MADLIB-1051
> URL: https://issues.apache.org/jira/browse/MADLIB-1051
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Decision Tree
> Reporter: Frank McQuillan
> Assignee: Rahul Iyer
> Priority: Minor
> Fix For: v1.10
>
> Attachments: skl_dt_cl.pdf, skl_dt_reg.pdf, tree_viz.jpg
>
>
> DT visualization needs better description in the docs plus should show split
> values in output viz. Could look something the attached picture.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)