Dear All,

I would like to share an issue that I faced with Complementary Naive Bayes
classifier while developing a text classification system using Mahout. I
was trying to compare result of Standard Naive Bayes classifier with
Complementary Naive Bayes classifier. But strangely I was getting same
accuracy for both classifier. I tried with several datasets but no success.


So I looked into the source code of two driver classes
*org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.java*and
*org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.java*. I
found following two lines for which always Standard Naive Bayes classifier
was getting called. "-c" option to run Complementary Naive Bayes classifier
was not making any change.

*TrainNaiveBayesJob.java : line 96*
    boolean trainComplementary = Boolean.parseBoolean(
getOption(TRAIN_COMPLEMENTARY));
    // results to false as getOption(TRAIN_COMPLEMENTARY) always returns
null.

*TestNaiveBayesDriver.java : line 139*
    boolean complementary = parsedArgs.containsKey("testComplementary");
    // always results to false as key in Map parsedArgs is
"--testComplementary" not "testComplementary".

Due to this Complementary Naive Bayes classifier was never getting called.

So I made following changes and that worked !!!

*TrainNaiveBayesJob.java :*
    boolean trainComplementary = hasOption(TRAIN_COMPLEMENTARY);

*TestNaiveBayesDriver.java :*
    boolean complementary = hasOption("testComplementary"); //or
complementary = parsedArgs.containsKey("--testComplementary");

Please find attached patch.

With Regards,
Gouri Sankar Majumder
### Eclipse Workspace Patch 1.0
#P mahout-trunk
Index: 
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
===================================================================
--- 
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
   (revision 1553649)
+++ 
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
   (working copy)
@@ -136,7 +136,9 @@
     Job testJob = prepareJob(getInputPath(), getOutputPath(), 
SequenceFileInputFormat.class, BayesTestMapper.class,
             Text.class, VectorWritable.class, SequenceFileOutputFormat.class);
     //testJob.getConfiguration().set(LABEL_KEY, getOption("--labels"));
-    boolean complementary = parsedArgs.containsKey("testComplementary");
+    
+    //boolean complementary = parsedArgs.containsKey("testComplementary"); 
//always result to false as key in hash map is "--testComplementary"
+    boolean complementary = hasOption("testComplementary"); //or  
complementary = parsedArgs.containsKey("--testComplementary");
     testJob.getConfiguration().set(COMPLEMENTARY, 
String.valueOf(complementary));
     return testJob.waitForCompletion(true);
   }
Index: 
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
===================================================================
--- 
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
 (revision 1553649)
+++ 
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
 (working copy)
@@ -93,8 +93,8 @@
     }
     long labelSize = createLabelIndex(labPath);
     float alphaI = Float.parseFloat(getOption(ALPHA_I));
-    boolean trainComplementary = 
Boolean.parseBoolean(getOption(TRAIN_COMPLEMENTARY));
-
+    //boolean trainComplementary = 
Boolean.parseBoolean(getOption(TRAIN_COMPLEMENTARY)); //always result to false
+    boolean trainComplementary = hasOption(TRAIN_COMPLEMENTARY);
 
     HadoopUtil.setSerializations(getConf());
     HadoopUtil.cacheFiles(labPath, getConf());

Reply via email to