Re: BUG: Anyone use block size more than 2GB before?

elton sky Thu, 21 Oct 2010 21:47:45 -0700

Milind,

You are right. But that only happens when your client is one of the data
nodes in HDFS. otherwise a random node will be picked up for the first
replica.


On Fri, Oct 22, 2010 at 3:37 PM, Milind A Bhandarkar
<[email protected]>wrote:

> If a file of say, 12.5 GB were produced by a single task with replication
> 3, the default replication policy will ensure that the first replica of each
> block will be created on local datanode. So, there will be one datanode in
> the cluster that contains one replica of all blocks of that file. Map
> placement hint specifies that node.
>
> It's evil, I know :-)
>
> - Milind
>
> On Oct 21, 2010, at 1:30 PM, Alex Kozlov wrote:
>
> > Hmm, this is interesting: how did it manage to keep the blocks local?
>  Why
> > performance was better?
> >
> > On Thu, Oct 21, 2010 at 11:43 AM, Owen O'Malley <[email protected]>
> wrote:
> >
> >> The block sizes were 2G. The input format made splits that were more
> than a
> >> block because that led to better performance.
> >>
> >> -- Owen
> >>
>
> --
> Milind Bhandarkar
> (mailto:[email protected])
> (phone: 408-203-5213 W)
>
>
>

Re: BUG: Anyone use block size more than 2GB before?

Reply via email to