[ 
https://issues.apache.org/jira/browse/MADLIB-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781659#comment-15781659
 ] 

Rahul Iyer commented on MADLIB-1051:
------------------------------------

So it looks like the sklearn function provides following information: 

- If internal node, feature and threshold that splits the tuples
- Impurity of the node
- 'samples': Number of tuples landing on that node
- 'value': 
    -- If regression: Prediction if the node was treated as a leaf (float)
    -- If classification: The counts of the output labels among the samples in 
that node
- If classification, actual prediction from that node

I suggest we output the same information, since we compute all of it in our 
tree functions. For regression, I suggest adding the standard deviation along 
with the mean value e.g. {{value = 0.5712 (+- 0.3)}} and have a human-readable 
format for the floats (in case the values get too big). 

> Display split values in DT visualization
> ----------------------------------------
>
>                 Key: MADLIB-1051
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1051
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Decision Tree
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>            Priority: Minor
>             Fix For: v1.10
>
>         Attachments: skl_dt_cl.pdf, skl_dt_reg.pdf, tree_viz.jpg
>
>
> DT visualization needs better description in the docs plus should show split 
> values in output viz.  Could look something the attached picture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to