If a file of say, 12.5 GB were produced by a single task with replication 3, 
the default replication policy will ensure that the first replica of each block 
will be created on local datanode. So, there will be one datanode in the 
cluster that contains one replica of all blocks of that file. Map placement 
hint specifies that node.

It's evil, I know :-)

- Milind

On Oct 21, 2010, at 1:30 PM, Alex Kozlov wrote:

> Hmm, this is interesting: how did it manage to keep the blocks local?  Why
> performance was better?
> 
> On Thu, Oct 21, 2010 at 11:43 AM, Owen O'Malley <[email protected]> wrote:
> 
>> The block sizes were 2G. The input format made splits that were more than a
>> block because that led to better performance.
>> 
>> -- Owen
>> 

--
Milind Bhandarkar
(mailto:[email protected])
(phone: 408-203-5213 W)


Reply via email to