A quick answer would be - it is heartbeat communication mechanism, poll-like flow between JT/TT's. Also for communication underneath its RPC, and not the default Java serialization but a hadoop specific serialization implementation to have some performance gains.
AVRO is in strong contention to be used in hadoop for serialization.You might like to also look up into Thrift, Google Protocol Buffers. Cheers, / On 4/21/10 4:37 PM, "Ahmad Shahzad" <ashahz...@gmail.com> wrote: Hey everyone, I wanted to know that which communication protocols hadoop mapreduce uses under the hood to provide communication if any. For example for the shuffle process it uses http to shuffle the values to the reducers. So, job tracker has to talk to task trackers, and task trackers have to report back to job trackers, and what about if the data is not available on the same node and the slave node has to fetch the data from other node. In all of the cases which communication mechanisms are used to achieve the communication, is it http only?? I would really appreciate if someone can tell me regarding this thing or if someone has some link that can help me regarding this issue. Regards, Ahmad Shahzad