Jonathan Gray wrote:
Currently, we cannot be perfect with MR jobs running locally.

We can, and (I believe in 0.19) we do, make an effort to put
TableInputFormat map tasks on the same nodes as the region is hosted.  From
there, the actual locations of the storefiles that make up the region could
be on any datanode.  So it's impossible to ensure all data is local from the
Task -> RegionServer -> DataNode.

There would be tremendous value in that case, and other cases like
HADOOP-4801, that being able to encourage a regions blocks to be co-hosted
on the node with the region would unlock.  Still hoping something comes of
that, unfortunately it's not even on my radar to look into myself.


I guess in a sense you could use column families to group data that would benefit from locality?


Wes

-----Original Message-----
From: Wes Chow [mailto:[email protected]]
Sent: Wednesday, April 01, 2009 6:19 AM
To: [email protected]
Subject: mapreduce locality


When running MapReduce processes with HBase, is it possible to have
Hadoop move the job to the machine that contains the relevant HStore? I
thought I read that it does do this at some point, but I'm unable to
find that reference at this moment...

Wes

Reply via email to