I have been stumped on this problem for a few days now, I hope somebody
here has a clue... I'm running Hadoop 0.5.0.

I'm fetching records from a database via a buffering proxy server. The
goal is to write the lot in a lucene index.

The map task is just the identity map, just collecting the key/value
pair without any processing. All the heavy lifting is done in the reduce
phase.

My problem is that the map tasks run for a while seemingly without
problems, and then start spitting out a lot of messages like this:

2006-08-24 15:46:53,465 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000001_0 Need 13 map output(s)
2006-08-24 15:46:53,465 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000001_0 Need 13 map output location(s)
2006-08-24 15:46:53,466 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000001_0 Got 0 map outputs from jobtracker
2006-08-24 15:46:53,466 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000001_0 Got 0 known map output location(s); scheduling...
2006-08-24 15:46:53,466 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000001_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0
dup hosts)
2006-08-24 15:46:53,467 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000001_0 0.0% reduce > copy >

Finally all tasks are marked failed and Hadoop seems to sleep. There are
no error messages in the logs or anywhere else (I have set
log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG in
hadoop/conf/log4j.properties). 

Any clues?

Cheers,
Mikkel

Reply via email to