Improve the results of MAHOUT-145 by uniformly distributing the classes in the 
partitioned data
-----------------------------------------------------------------------------------------------

                 Key: MAHOUT-216
                 URL: https://issues.apache.org/jira/browse/MAHOUT-216
             Project: Mahout
          Issue Type: Improvement
          Components: Classification
            Reporter: Deneche A. Hakim
            Assignee: Deneche A. Hakim


the poor results of the partial decision forest implementation may be explained 
by the particular distribution of the partitioned data. For example, if a 
partition does not contain any instance of a given class, the decision trees 
built using this partition won't be able to classify this class. 
According to [CHAN, 95]:

{quote}
Random Selection of the partitioned data sets with a uniform distribution of 
classes is perhaps the most sensible solution. Here we may attempt to maintain 
the same frequency distribution over the ''class attribute" so that each 
partition represents a good but a smaller model of the entire training set
{quote}

[CHAN, 95]: Philip K. Chan, "On the Accuracy of Meta-learning for Scalable Data 
Mining" 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to