Better handling of Categorical attributes when building Decision Forests
------------------------------------------------------------------------

                 Key: MAHOUT-245
                 URL: https://issues.apache.org/jira/browse/MAHOUT-245
             Project: Mahout
          Issue Type: Improvement
          Components: Classification
    Affects Versions: 0.3
            Reporter: Deneche A. Hakim
            Assignee: Deneche A. Hakim
             Fix For: 0.3


When building a decision tree, at each node a random subset from all variables 
(attributes) is considered for the node split.
If a Categorical variable has been selected, the data available at the node is 
split such that each child node has the same value for the selected variable. 
In all sub-nodes the selected variable should not be selected again, but the 
current implementation does not account for that. The resulting tree may 
contain redundant nodes that does not impair its classification performance but 
are nonetheless unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to