Hello everyone, I have a question about the intermediate data output by the map function. I wanted to know that does this intermediate data get written to the HDFS or it stays in the node's local memory? According to the MapReduce paper, the intermediate data is run through a hash function which maps every key to a given a reduce worker. So how does this whole process happen? Does the map worker write the intermediate data to the HDFS and then tells the JobTracker (Master) which Reduce worker should be allotted this data? Or the Map worker keeps the intermediate data in memory and makes an RPC call directly to the reduce worker (which was figured out by the hash function) to transfer the intermediate data?
It will be great if you can point me to the place, where these functionalities are implemented in hadoop. Plus it will be great if you can also point me to the place where the hash function is in map? thanks again for the great support on this mailing list. regards, -- Ahmad Humayun Research Assistant Computer Science Dpt., LUMS +92 321 4457315
