Gangadhar, After some system issues, I finally ran the TrainClassifier. After almost 65% into the map job, I got the same error that you have mentioned. INFO mapred.JobClient: Task Id : attempt_201009160819_0002_m_000000_0, Status : FAILED org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201009160819_0002/attempt_201009160819_0002_m_000000_0/output/file.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) ... Havent yet analyzed the root cause / solution but just wanted to confirm that I am facing the same issue as you do. I'll try to search / analyze and post more details.
reg, Joe. On Wed, Sep 15, 2010 at 10:20 PM, Joe Kumar <[email protected]> wrote: > Hi Gangadhar, > > rite. I did the same to execute the TrainClassifier but then since the > default datasource is hdfs, we should not be mandated to provide this > parameter. > I havent completed executing the TrainClassifier yet. I'll do it tonite and > let you know if I get into trouble. > > reg, > Joe. > > > On Wed, Sep 15, 2010 at 9:41 PM, Gangadhar Nittala < > [email protected]> wrote: > >> I ran into the issue that Joe mentioned about the command line >> parameters. I just added the datasource to the command line to execute >> thus >> $HADOOP_HOME/bin/hadoop jar >> $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job >> org.apache.mahout.classifier.bayes.TrainClassifier --gramSize 3 >> --input wikipediainput10 --output wikipediamodel10 --classifierType >> bayes --dataSource hdfs >> >> On a related note, Joe, were you able to run the TrainClassifier >> without any errors ? When I tried this, the map-reduce job would abort >> always at 99%. I tried the example that was given in the wiki with >> both subjects and countries. I even reduced the list of countries in >> the country.txt assuming that was what was causing the issue. No >> matter what, the classifier task fails. And the exception in the task >> log : >> >> 10-09-14 08:25:27,026 INFO org.apache.hadoop.mapred.MapTask: bufstart >> = 41271492; bufend = 58259002; bufvoid = 99614720 >> 2010-09-14 08:25:27,026 INFO org.apache.hadoop.mapred.MapTask: kvstart >> = 196379; kvend = 130842; length = 327680 >> 2010-09-14 08:25:48,136 INFO org.apache.hadoop.mapred.MapTask: >> Finished spill 287 >> 2010-09-14 08:25:48,417 INFO org.apache.hadoop.mapred.MapTask: >> Starting flush of map output >> 2010-09-14 08:26:00,386 INFO org.apache.hadoop.mapred.MapTask: >> Finished spill 288 >> 2010-09-14 08:26:08,765 WARN org.apache.hadoop.mapred.TaskTracker: >> Error running child >> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >> any valid local directory for >> >> taskTracker/jobcache/job_201009132133_0002/attempt_201009132133_0002_m_000001_3/output/file.out >> at >> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) >> at >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) >> at >> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1469) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> I checked the hadoop JIRA and this seems to be fixed already >> https://issues.apache.org/jira/browse/HADOOP-4963. I am not sure what >> I am doing wrong. Any suggestions to what I need to change to get this >> fixed will be very helpful. I have been struggling with this for a >> while now. >> >> Thank you >> >> On Wed, Sep 15, 2010 at 1:16 AM, Joe Kumar <[email protected]> wrote: >> > Robin, >> > >> > sure. I'll submit a patch. >> > >> > The command line flag already has the default behavior specified. >> > --classifierType (-type) classifierType Type of classifier: >> > bayes|cbayes. >> > Default: bayes >> > >> > --dataSource (-source) dataSource Location of model: >> hdfs|hbase. >> > >> > Default Value: hdfs >> > So there is no change in the flag description. >> > >> > reg, >> > Joe. >> > >> > >> > On Wed, Sep 15, 2010 at 1:10 AM, Robin Anil <[email protected]> >> wrote: >> > >> >> On Wed, Sep 15, 2010 at 10:26 AM, Joe Kumar <[email protected]> >> wrote: >> >> >> >> > Hi all, >> >> > >> >> > As I was going through wikipedia example, I encountered a situation >> with >> >> > TrainClassifier wherein some of the options with default values are >> >> > actually >> >> > mandatory. >> >> > The documentation / command line help says that >> >> > >> >> > 1. default source (--datasource) is hdfs but TrainClassifier >> >> > has withRequired(true) while building the --datasource option. We >> are >> >> > checking if the dataSourceType is hbase else set it to hdfs. so >> >> > ideally withRequired should be set to false >> >> > 2. default --classifierType is bayes but withRequired is set to >> true >> >> and >> >> > we have code like >> >> > >> >> > if ("bayes".equalsIgnoreCase(classifierType)) { >> >> > log.info("Training Bayes Classifier"); >> >> > trainNaiveBayes(inputPath, outputPath, params); >> >> > >> >> > } else if ("cbayes".equalsIgnoreCase(classifierType)) { >> >> > log.info("Training Complementary Bayes Classifier"); >> >> > // setup the HDFS and copy the files there, then run the >> trainer >> >> > trainCNaiveBayes(inputPath, outputPath, params); >> >> > } >> >> > >> >> > which should be changed to >> >> > >> >> > *if ("cbayes".equalsIgnoreCase(classifierType)) {* >> >> > log.info("Training Complementary Bayes Classifier"); >> >> > trainCNaiveBayes(inputPath, outputPath, params); >> >> > >> >> > } *else {* >> >> > log.info("Training Bayes Classifier"); >> >> > // setup the HDFS and copy the files there, then run the >> trainer >> >> > trainNaiveBayes(inputPath, outputPath, params); >> >> > } >> >> > >> >> > Please let me know if this looks valid and I'll submit a patch for a >> JIRA >> >> > issue. >> >> > >> >> > +1 all valid. , Go ahead and fix it and in the cmdline flags write >> the >> >> default behavior in the flag description >> >> >> >> >> >> > reg >> >> > Joe. >> >> > >> >> >> > >> > > > > >
