I've got a proposed fix. I'll put up a patch tomorrow, assuming
testing works out.
On Oct 31, 2008, at 6:17 PM, Grant Ingersoll wrote:
See MAHOUT-92.
On Oct 31, 2008, at 5:19 PM, Grant Ingersoll wrote:
Hi Robin,
I'm trying to get the Bayes stuff working on the 20 Newsgroups per
the instructions on MAHOUT-20. It seems like the
BayesFeatureMapper isn't really doing anything. Sean put in a
"TODO" comment on line 72, and it pretty much shows that the
word_list is not getting anything in.
When I got to run this, I get:
08/10/31 17:18:09 INFO bayes.BayesDriver: Calculating Tf-Idf...
08/10/31 17:18:09 INFO common.BayesTfIdfDriver: Counts of documents
in Each Label
08/10/31 17:18:09 INFO common.BayesTfIdfDriver:
{rec.motorcycles=994.0, comp.windows.x=980.0,
talk.politics.guns=910.0, talk.politics.mideast=940.0,
talk.religion.misc=628.0, rec.sport.baseball=994.0,
rec.autos=990.0, rec.sport.hockey=999.0,
comp.sys.mac.hardware=961.0, comp.sys.ibm.pc.hardware=982.0,
sci.space=987.0, talk.politics.misc=775.0, sci.electronics=981.0,
comp.graphics=973.0, sci.crypt=991.0, sci.med=990.0,
soc.religion.christian=997.0, alt.atheism=799.0,
misc.forsale=972.0, comp.os.ms-windows.misc=985.0}
08/10/31 17:18:09 INFO jvm.JvmMetrics: Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
08/10/31 17:18:09 WARN mapred.JobClient: Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for
the same.
08/10/31 17:18:10 WARN mapred.JobClient: No job jar file set. User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: file:/Volumes/User/grantingersoll/projects/lucene/mahout/
output/bayes/trainer-termDocCount
Input path does not exist: file:/Volumes/User/grantingersoll/
projects/lucene/mahout/output/bayes/trainer-wordFreq
Input path does not exist: file:/Volumes/User/grantingersoll/
projects/lucene/mahout/output/bayes/trainer-featureCount
at
org
.apache
.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at
org
.apache
.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:210)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at
org
.apache
.mahout
.classifier
.bayes.common.BayesTfIdfDriver.runJob(BayesTfIdfDriver.java:112)
at
org
.apache.mahout.classifier.bayes.BayesDriver.runJob(BayesDriver.java:
76)
at
org
.apache.mahout.classifier.bayes.BayesDriver.main(BayesDriver.java:54)
I'm pretty sure we need to add something to the word list from the
input value. Right?
-Grant