Johnson, Jorgen wrote:
Create a QueueInputFormat, which provides a RecordReader implementation that pops values off a globally accessible queue*. This would require filling the queue with values prior to loading the map/red job. This would allow the mappers to cram values back into the queue for further processing when necessary.
Maintaining such a queue would be tricky, I think. One concern is that one might pop the last item from the queue and prematurely terminate the job. To fix this you would need to leave things in the queue until their map processing completes, but also ensure that no other map task removes them from the queue while they're being processed. Then you'd need to worry about handling failed tasks. I think it would be far simpler to do this iteratively, with multiple mapreduce passes, each writing files to a new temporary directory that's the input for the next pass, thus performing a breadth-first traversal of the space, with a mapreduce stage at each depth.
Doug
