I have been stumped on this problem for a few days now, I hope somebody here has a clue... I'm running Hadoop 0.5.0.
I'm fetching records from a database via a buffering proxy server. The goal is to write the lot in a lucene index. The map task is just the identity map, just collecting the key/value pair without any processing. All the heavy lifting is done in the reduce phase. My problem is that the map tasks run for a while seemingly without problems, and then start spitting out a lot of messages like this: 2006-08-24 15:46:53,465 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000001_0 Need 13 map output(s) 2006-08-24 15:46:53,465 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000001_0 Need 13 map output location(s) 2006-08-24 15:46:53,466 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000001_0 Got 0 map outputs from jobtracker 2006-08-24 15:46:53,466 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000001_0 Got 0 known map output location(s); scheduling... 2006-08-24 15:46:53,466 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000001_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts) 2006-08-24 15:46:53,467 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_r_000001_0 0.0% reduce > copy > Finally all tasks are marked failed and Hadoop seems to sleep. There are no error messages in the logs or anywhere else (I have set log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG in hadoop/conf/log4j.properties). Any clues? Cheers, Mikkel