Assuming that node A only contains replica, there is no garante that its data would never be read. First, you might lose a replica. The copy inside the node A could be used to create the missing replica again. Second, data locality is on best effort. If all the map slots are occupied except one on one node without a replica of the data then your node A is as likely as any other to be chosen as a source.
Regards Bertrand On Fri, Aug 24, 2012 at 10:09 PM, Marc Sturlese <marc.sturl...@gmail.com>wrote: > Hey there, > I have a doubt about reduce tasks and block writes. Do a reduce task always > first write to hdfs in the node where they it is placed? (and then these > blocks would be replicated to other nodes) > In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and > one > (node A) just run DN, when running MR jobs, map tasks would never read from > node A? This would be because maps have data locality and if the reduce > tasks write first to the node where they live, one replica of the block > would always be in a node that has a TT. Node A would just contain blocks > created from replication by the framework as no reduce task would write > there directly. Is this correct? > Thanks in advance > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/doubt-about-reduce-tasks-and-block-writes-tp4003185.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. > -- Bertrand Dechoux