Johnson, Jorgen wrote:
Create a QueueInputFormat, which provides a RecordReader implementation that 
pops values off a globally accessible queue*.  This would require filling the 
queue with values prior to loading the map/red job.  This would allow the 
mappers to cram values back into the queue for further processing when 
necessary.

Maintaining such a queue would be tricky, I think. One concern is that one might pop the last item from the queue and prematurely terminate the job. To fix this you would need to leave things in the queue until their map processing completes, but also ensure that no other map task removes them from the queue while they're being processed. Then you'd need to worry about handling failed tasks. I think it would be far simpler to do this iteratively, with multiple mapreduce passes, each writing files to a new temporary directory that's the input for the next pass, thus performing a breadth-first traversal of the space, with a mapreduce stage at each depth.

Doug

Reply via email to