Options in Bayes TrainClassifier and TestClassifier
---------------------------------------------------

                 Key: MAHOUT-509
                 URL: https://issues.apache.org/jira/browse/MAHOUT-509
             Project: Mahout
          Issue Type: Bug
          Components: Classification
            Reporter: Joe Prasanna Kumar
            Priority: Minor
             Fix For: 0.4


Hi all,

As I was going through wikipedia example, I encountered a situation with 
TrainClassifier wherein some of the options with default values are actually 
mandatory. 
The documentation / command line help says that 
default source (--datasource) is hdfs but TrainClassifier has 
withRequired(true) while building the --datasource option. We are checking if 
the dataSourceType is hbase else set it to hdfs. so ideally withRequired should 
be set to false
default --classifierType is bayes but withRequired is set to true and we have 
code like
if ("bayes".equalsIgnoreCase(classifierType)) {
        log.info("Training Bayes Classifier");
        trainNaiveBayes(inputPath, outputPath, params);
        
      } else if ("cbayes".equalsIgnoreCase(classifierType)) {
        log.info("Training Complementary Bayes Classifier");
        // setup the HDFS and copy the files there, then run the trainer
        trainCNaiveBayes(inputPath, outputPath, params);
      }

which should be changed to

if ("cbayes".equalsIgnoreCase(classifierType)) {
        log.info("Training Complementary Bayes Classifier");
        trainCNaiveBayes(inputPath, outputPath, params);
        
      } else  {
        log.info("Training  Bayes Classifier");
        // setup the HDFS and copy the files there, then run the trainer
        trainNaiveBayes(inputPath, outputPath, params);
      }

Please let me know if this looks valid and I'll submit a patch for a JIRA issue.

reg
Joe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to