Definitely.

HBase is all about data locality.  The general grouping of everything is
(row+family), stored in row order (so crossing to the next row for the same
family is "cheap" but that's only available in a scanner).  So, in general,
you want to keep things that will be read together in a row+family together,
if you need to cross rows, then same family.

Grabbing multiple families is not particularly efficient today, it's really
like running separate and sequential (not parallel) reads.  There's lots of
room for improvement here, some of which will be seen in the 0.20.  Stay
tuned to HBASE-1249 and related issues for details.

JG

> -----Original Message-----
> From: Wes Chow [mailto:[email protected]]
> Sent: Wednesday, April 01, 2009 8:11 AM
> To: [email protected]
> Subject: Re: mapreduce locality
> 
> 
> 
> Jonathan Gray wrote:
> > Currently, we cannot be perfect with MR jobs running locally.
> >
> > We can, and (I believe in 0.19) we do, make an effort to put
> > TableInputFormat map tasks on the same nodes as the region is hosted.
> From
> > there, the actual locations of the storefiles that make up the region
> could
> > be on any datanode.  So it's impossible to ensure all data is local
> from the
> > Task -> RegionServer -> DataNode.
> >
> > There would be tremendous value in that case, and other cases like
> > HADOOP-4801, that being able to encourage a regions blocks to be co-
> hosted
> > on the node with the region would unlock.  Still hoping something
> comes of
> > that, unfortunately it's not even on my radar to look into myself.
> 
> 
> I guess in a sense you could use column families to group data that
> would benefit from locality?
> 
> 
> Wes
> 
> >> -----Original Message-----
> >> From: Wes Chow [mailto:[email protected]]
> >> Sent: Wednesday, April 01, 2009 6:19 AM
> >> To: [email protected]
> >> Subject: mapreduce locality
> >>
> >>
> >> When running MapReduce processes with HBase, is it possible to have
> >> Hadoop move the job to the machine that contains the relevant
> HStore? I
> >> thought I read that it does do this at some point, but I'm unable to
> >> find that reference at this moment...
> >>
> >> Wes
> >

Reply via email to