Dear All,
I would like to share an issue that I faced with Complementary Naive Bayes
classifier while developing a text classification system using Mahout. I
was trying to compare result of Standard Naive Bayes classifier with
Complementary Naive Bayes classifier. But strangely I was getting same
accuracy for both classifier. I tried with several datasets but no success.
So I looked into the source code of two driver classes
*org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.java*and
*org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.java*. I
found following two lines for which always Standard Naive Bayes classifier
was getting called. "-c" option to run Complementary Naive Bayes classifier
was not making any change.
*TrainNaiveBayesJob.java : line 96*
boolean trainComplementary = Boolean.parseBoolean(
getOption(TRAIN_COMPLEMENTARY));
// results to false as getOption(TRAIN_COMPLEMENTARY) always returns
null.
*TestNaiveBayesDriver.java : line 139*
boolean complementary = parsedArgs.containsKey("testComplementary");
// always results to false as key in Map parsedArgs is
"--testComplementary" not "testComplementary".
Due to this Complementary Naive Bayes classifier was never getting called.
So I made following changes and that worked !!!
*TrainNaiveBayesJob.java :*
boolean trainComplementary = hasOption(TRAIN_COMPLEMENTARY);
*TestNaiveBayesDriver.java :*
boolean complementary = hasOption("testComplementary"); //or
complementary = parsedArgs.containsKey("--testComplementary");
Please find attached patch.
With Regards,
Gouri Sankar Majumder
### Eclipse Workspace Patch 1.0
#P mahout-trunk
Index:
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
===================================================================
---
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
(revision 1553649)
+++
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
(working copy)
@@ -136,7 +136,9 @@
Job testJob = prepareJob(getInputPath(), getOutputPath(),
SequenceFileInputFormat.class, BayesTestMapper.class,
Text.class, VectorWritable.class, SequenceFileOutputFormat.class);
//testJob.getConfiguration().set(LABEL_KEY, getOption("--labels"));
- boolean complementary = parsedArgs.containsKey("testComplementary");
+
+ //boolean complementary = parsedArgs.containsKey("testComplementary");
//always result to false as key in hash map is "--testComplementary"
+ boolean complementary = hasOption("testComplementary"); //or
complementary = parsedArgs.containsKey("--testComplementary");
testJob.getConfiguration().set(COMPLEMENTARY,
String.valueOf(complementary));
return testJob.waitForCompletion(true);
}
Index:
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
===================================================================
---
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
(revision 1553649)
+++
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
(working copy)
@@ -93,8 +93,8 @@
}
long labelSize = createLabelIndex(labPath);
float alphaI = Float.parseFloat(getOption(ALPHA_I));
- boolean trainComplementary =
Boolean.parseBoolean(getOption(TRAIN_COMPLEMENTARY));
-
+ //boolean trainComplementary =
Boolean.parseBoolean(getOption(TRAIN_COMPLEMENTARY)); //always result to false
+ boolean trainComplementary = hasOption(TRAIN_COMPLEMENTARY);
HadoopUtil.setSerializations(getConf());
HadoopUtil.cacheFiles(labPath, getConf());