Milind, You are right. But that only happens when your client is one of the data nodes in HDFS. otherwise a random node will be picked up for the first replica.
On Fri, Oct 22, 2010 at 3:37 PM, Milind A Bhandarkar <[email protected]>wrote: > If a file of say, 12.5 GB were produced by a single task with replication > 3, the default replication policy will ensure that the first replica of each > block will be created on local datanode. So, there will be one datanode in > the cluster that contains one replica of all blocks of that file. Map > placement hint specifies that node. > > It's evil, I know :-) > > - Milind > > On Oct 21, 2010, at 1:30 PM, Alex Kozlov wrote: > > > Hmm, this is interesting: how did it manage to keep the blocks local? > Why > > performance was better? > > > > On Thu, Oct 21, 2010 at 11:43 AM, Owen O'Malley <[email protected]> > wrote: > > > >> The block sizes were 2G. The input format made splits that were more > than a > >> block because that led to better performance. > >> > >> -- Owen > >> > > -- > Milind Bhandarkar > (mailto:[email protected]) > (phone: 408-203-5213 W) > > >
