Improve the results of MAHOUT-145 by uniformly distributing the classes in the partitioned data -----------------------------------------------------------------------------------------------
Key: MAHOUT-216 URL: https://issues.apache.org/jira/browse/MAHOUT-216 Project: Mahout Issue Type: Improvement Components: Classification Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim the poor results of the partial decision forest implementation may be explained by the particular distribution of the partitioned data. For example, if a partition does not contain any instance of a given class, the decision trees built using this partition won't be able to classify this class. According to [CHAN, 95]: {quote} Random Selection of the partitioned data sets with a uniform distribution of classes is perhaps the most sensible solution. Here we may attempt to maintain the same frequency distribution over the ''class attribute" so that each partition represents a good but a smaller model of the entire training set {quote} [CHAN, 95]: Philip K. Chan, "On the Accuracy of Meta-learning for Scalable Data Mining" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.