2010/1/18 Olivier Grisel <olivier.gri...@ensta.org>: > 2010/1/18 Robin Anil <robin.a...@gmail.com>: >> could you be specific on which map/reduce job you encountered the error ? > > I thought it was on: > > hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job > org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i > "wikipediadump/chunk-0001.xml" -o wikipediainput-eof-exception -c > examples/src/test/resources/country.txt > > I just ran it again... successfully... The next time I encounter that > error I will note the complete complete stacktrace however > uninformative it looks.
I ran the same jobs again on all the chunks and could reproduce the error: $ hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i "wikipediadump" -o wikipediainput-eof-exception -c examples/src/test/resources/country.txt [...] 10/01/18 16:20:46 INFO mapred.JobClient: map 100% reduce 83% 10/01/18 16:21:42 INFO mapred.JobClient: Task Id : attempt_201001172109_0010_r_000000_2, Status : FAILED java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263) I have no idea where it could possibly stem from. -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name