Seems strange that if you are running on your cluster that there refs to LocalJobRunner and RawLocalFileSystem.
On Jul 29, 2011, at 11:04 PM, Jake Mannix wrote: > Not sure if this is something with my prod cluster, or a bug, but when > running seq2sparse on my production hadoop cluster, I keep making it all the > way through the tokenization, dictionary creation, etc, but then the > TFPartialVectorReducer blows up: > > 11/07/30 06:00:04 INFO mapred.LocalJobRunner: > 11/07/30 06:00:04 INFO mapred.TaskRunner: Task > 'attempt_local_0003_m_000003_0' done. > 11/07/30 06:00:04 INFO mapred.LocalJobRunner: > 11/07/30 06:00:04 INFO mapred.Merger: Merging 4 sorted segments > 11/07/30 06:00:04 INFO mapred.Merger: Down to the last merge-pass, with 4 > segments left of total size: 243328920 bytes > 11/07/30 06:00:04 INFO mapred.LocalJobRunner: > 11/07/30 06:00:04 WARN mapred.LocalJobRunner: job_local_0003 > java.lang.IllegalStateException: /user/jake/status_parsed/dictionary.file-0 > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63) > at > org.apache.mahout.vectorizer.term.TFPartialVectorReducer.setup(TFPartialVectorReducer.java:130) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:215) > Caused by: java.io.FileNotFoundException: File > file:/user/jake/status_parsed/dictionary.file-0 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:718) > at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58) > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61) > ... 5 more > 11/07/30 06:00:04 INFO mapred.JobClient: Job complete: job_local_0003 > > > The file listed (without a filesystem uri!) > "/user/jake/status_parsed/dictionary.file-0" exists on the cluster, but it's > probably not showing up in the DistributedCache properly somehow. > > Anyone run into anything like this before? It's been a while since I've run > seq2sparse on a real-hardware / managed cluster, not sure if it's me, or > mahout, or a configuration setting somehow. > > -jake -------------------------------------------- Grant Ingersoll
