Hi I have two MapReduces running sequentially to accomplish a job. I first started running the jobs locally in a single machine. First MapReduce produces a set of keys which were stored inmemory in a Set instead of output.collect in the reduce. and the second MapReduce working on different input files looked up the keys from the Set to act on the input lines. But now I want to run the MapReduces on a small cluster. In memory storage will not work here. How can the second Map running on various machines load all the keys from first MapReduce before it starts working on input files. Any ideas..?
Many Thanks Sandhya
