Hello all, I am just starting out with Mahout, and to get my feet wet I am running through the TwentyNewsGroups example. I have successfully configured a single node Hadoop system as well as a pseudo-distributed Hadoop system on two separate machines. On both environments, I have gone through the guide successfully to put all the news inputs into the folder 20news-input. I am able to successfully ls and cat the files in the directory.
However, when I go to run the TrainClassifier, I am getting the following message: 10/04/05 09:48:33 INFO bayes.TrainClassifier: Training Complementary Bayes Classifier 10/04/05 09:48:33 INFO cbayes.CBayesDriver: Reading features... 10/04/05 09:48:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/04/05 09:48:33 INFO mapred.FileInputFormat: Total input paths to process : 19 Exception in thread "main" java.io.IOException: Not a file: hdfs://localhost:9000/user/bob/20news-input/comp.graphics at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:75) at org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:61) at org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:56) at org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I get this error on both the single node system I have setup, as well as the separate dual-node system. As I said before, I am able to cat and ls that directory and the files in it perfectly fine. Any thoughts? Thanks!