[ 
https://issues.apache.org/jira/browse/MAHOUT-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804272#action_12804272
 ] 

Sean Owen commented on MAHOUT-245:
----------------------------------

Can I commit this? any objection?

> Better handling of Categorical attributes when building Decision Forests
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-245
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-245
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: 0.3
>
>         Attachments: mahout-245.patch
>
>
> When building a decision tree, at each node a random subset from all 
> variables (attributes) is considered for the node split.
> If a Categorical variable has been selected, the data available at the node 
> is split such that each child node has the same value for the selected 
> variable. In all sub-nodes the selected variable should not be selected 
> again, but the current implementation does not account for that. The 
> resulting tree may contain redundant nodes that does not impair its 
> classification performance but are nonetheless unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to