[ https://issues.apache.org/jira/browse/MAHOUT-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deneche A. Hakim updated MAHOUT-245: ------------------------------------ Status: Patch Available (was: Open) > Better handling of Categorical attributes when building Decision Forests > ------------------------------------------------------------------------ > > Key: MAHOUT-245 > URL: https://issues.apache.org/jira/browse/MAHOUT-245 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Deneche A. Hakim > Assignee: Deneche A. Hakim > Fix For: 0.3 > > Attachments: mahout-245.patch > > > When building a decision tree, at each node a random subset from all > variables (attributes) is considered for the node split. > If a Categorical variable has been selected, the data available at the node > is split such that each child node has the same value for the selected > variable. In all sub-nodes the selected variable should not be selected > again, but the current implementation does not account for that. The > resulting tree may contain redundant nodes that does not impair its > classification performance but are nonetheless unnecessary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.