[jira] Created: (MAHOUT-216) Improve the results of MAHOUT-145 by uniformly distributing the classes in the partitioned data

Deneche A. Hakim (JIRA) Thu, 10 Dec 2009 11:12:52 -0800

Improve the results of MAHOUT-145 by uniformly distributing the classes in the 
partitioned data
-----------------------------------------------------------------------------------------------


                 Key: MAHOUT-216
                 URL: https://issues.apache.org/jira/browse/MAHOUT-216
             Project: Mahout
          Issue Type: Improvement
          Components: Classification
            Reporter: Deneche A. Hakim
            Assignee: Deneche A. Hakim


the poor results of the partial decision forest implementation may be explained 
by the particular distribution of the partitioned data. For example, if a 
partition does not contain any instance of a given class, the decision trees 
built using this partition won't be able to classify this class. 
According to [CHAN, 95]:

{quote}
Random Selection of the partitioned data sets with a uniform distribution of 
classes is perhaps the most sensible solution. Here we may attempt to maintain 
the same frequency distribution over the ''class attribute" so that each 
partition represents a good but a smaller model of the entire training set
{quote}

[CHAN, 95]: Philip K. Chan, "On the Accuracy of Meta-learning for Scalable Data 
Mining" 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-216) Improve the results of MAHOUT-145 by uniformly distributing the classes in the partitioned data

Reply via email to