See MAHOUT-92.

On Oct 31, 2008, at 5:19 PM, Grant Ingersoll wrote:

Hi Robin,

I'm trying to get the Bayes stuff working on the 20 Newsgroups per the instructions on MAHOUT-20. It seems like the BayesFeatureMapper isn't really doing anything. Sean put in a "TODO" comment on line 72, and it pretty much shows that the word_list is not getting anything in.

When I got to run this, I get:
08/10/31 17:18:09 INFO bayes.BayesDriver: Calculating Tf-Idf...
08/10/31 17:18:09 INFO common.BayesTfIdfDriver: Counts of documents in Each Label 08/10/31 17:18:09 INFO common.BayesTfIdfDriver: {rec.motorcycles=994.0, comp.windows.x=980.0, talk.politics.guns=910.0, talk.politics.mideast=940.0, talk.religion.misc=628.0, rec.sport.baseball=994.0, rec.autos=990.0, rec.sport.hockey=999.0, comp.sys.mac.hardware=961.0, comp.sys.ibm.pc.hardware=982.0, sci.space=987.0, talk.politics.misc=775.0, sci.electronics=981.0, comp.graphics=973.0, sci.crypt=991.0, sci.med=990.0, soc.religion.christian=997.0, alt.atheism=799.0, misc.forsale=972.0, comp.os.ms-windows.misc=985.0} 08/10/31 17:18:09 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 08/10/31 17:18:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/31 17:18:10 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Volumes/User/grantingersoll/projects/lucene/mahout/ output/bayes/trainer-termDocCount Input path does not exist: file:/Volumes/User/grantingersoll/ projects/lucene/mahout/output/bayes/trainer-wordFreq Input path does not exist: file:/Volumes/User/grantingersoll/ projects/lucene/mahout/output/bayes/trainer-featureCount at org .apache .hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179) at org .apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java: 210)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at org .apache .mahout .classifier .bayes.common.BayesTfIdfDriver.runJob(BayesTfIdfDriver.java:112) at org .apache.mahout.classifier.bayes.BayesDriver.runJob(BayesDriver.java: 76) at org.apache.mahout.classifier.bayes.BayesDriver.main(BayesDriver.java: 54)

I'm pretty sure we need to add something to the word list from the input value. Right?

-Grant

Reply via email to