[ 
https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Prasanna Kumar updated MAHOUT-509:
--------------------------------------

    Attachment: MAHOUT-509.patch

The patch contains changes to 
1. TrainClassifier - setting default values of classifierType, dataSource, 
ngram, mindf
2. TestClassifier - setting default values of classifierType, dataSource and 
just rearranging code for setting the default values
3. driver.classes.props - added entry for WikipediaXmlSplitter and 
WikipediaDatasetCreatorDriver, so they could executed using the mahout command 
line util.

> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
>                 Key: MAHOUT-509
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-509
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>            Reporter: Joe Prasanna Kumar
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: MAHOUT-509.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with 
> TrainClassifier wherein some of the options with default values are actually 
> mandatory. 
> The documentation / command line help says that 
> default source (--datasource) is hdfs but TrainClassifier has 
> withRequired(true) while building the --datasource option. We are checking if 
> the dataSourceType is hbase else set it to hdfs. so ideally withRequired 
> should be set to false
> default --classifierType is bayes but withRequired is set to true and we have 
> code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
>         log.info("Training Bayes Classifier");
>         trainNaiveBayes(inputPath, outputPath, params);
>         
>       } else if ("cbayes".equalsIgnoreCase(classifierType)) {
>         log.info("Training Complementary Bayes Classifier");
>         // setup the HDFS and copy the files there, then run the trainer
>         trainCNaiveBayes(inputPath, outputPath, params);
>       }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
>         log.info("Training Complementary Bayes Classifier");
>         trainCNaiveBayes(inputPath, outputPath, params);
>         
>       } else  {
>         log.info("Training  Bayes Classifier");
>         // setup the HDFS and copy the files there, then run the trainer
>         trainNaiveBayes(inputPath, outputPath, params);
>       }
> Please let me know if this looks valid and I'll submit a patch for a JIRA 
> issue.
> reg
> Joe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to