Better handling of Categorical attributes when building Decision Forests ------------------------------------------------------------------------
Key: MAHOUT-245 URL: https://issues.apache.org/jira/browse/MAHOUT-245 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.3 Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Fix For: 0.3 When building a decision tree, at each node a random subset from all variables (attributes) is considered for the node split. If a Categorical variable has been selected, the data available at the node is split such that each child node has the same value for the selected variable. In all sub-nodes the selected variable should not be selected again, but the current implementation does not account for that. The resulting tree may contain redundant nodes that does not impair its classification performance but are nonetheless unnecessary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.