If a file of say, 12.5 GB were produced by a single task with replication 3, the default replication policy will ensure that the first replica of each block will be created on local datanode. So, there will be one datanode in the cluster that contains one replica of all blocks of that file. Map placement hint specifies that node.
It's evil, I know :-) - Milind On Oct 21, 2010, at 1:30 PM, Alex Kozlov wrote: > Hmm, this is interesting: how did it manage to keep the blocks local? Why > performance was better? > > On Thu, Oct 21, 2010 at 11:43 AM, Owen O'Malley <[email protected]> wrote: > >> The block sizes were 2G. The input format made splits that were more than a >> block because that led to better performance. >> >> -- Owen >> -- Milind Bhandarkar (mailto:[email protected]) (phone: 408-203-5213 W)
