Currently, we cannot be perfect with MR jobs running locally.

We can, and (I believe in 0.19) we do, make an effort to put
TableInputFormat map tasks on the same nodes as the region is hosted.  From
there, the actual locations of the storefiles that make up the region could
be on any datanode.  So it's impossible to ensure all data is local from the
Task -> RegionServer -> DataNode.

There would be tremendous value in that case, and other cases like
HADOOP-4801, that being able to encourage a regions blocks to be co-hosted
on the node with the region would unlock.  Still hoping something comes of
that, unfortunately it's not even on my radar to look into myself.

JG

> -----Original Message-----
> From: Wes Chow [mailto:[email protected]]
> Sent: Wednesday, April 01, 2009 6:19 AM
> To: [email protected]
> Subject: mapreduce locality
> 
> 
> When running MapReduce processes with HBase, is it possible to have
> Hadoop move the job to the machine that contains the relevant HStore? I
> thought I read that it does do this at some point, but I'm unable to
> find that reference at this moment...
> 
> Wes

Reply via email to