Currently, we cannot be perfect with MR jobs running locally. We can, and (I believe in 0.19) we do, make an effort to put TableInputFormat map tasks on the same nodes as the region is hosted. From there, the actual locations of the storefiles that make up the region could be on any datanode. So it's impossible to ensure all data is local from the Task -> RegionServer -> DataNode.
There would be tremendous value in that case, and other cases like HADOOP-4801, that being able to encourage a regions blocks to be co-hosted on the node with the region would unlock. Still hoping something comes of that, unfortunately it's not even on my radar to look into myself. JG > -----Original Message----- > From: Wes Chow [mailto:[email protected]] > Sent: Wednesday, April 01, 2009 6:19 AM > To: [email protected] > Subject: mapreduce locality > > > When running MapReduce processes with HBase, is it possible to have > Hadoop move the job to the machine that contains the relevant HStore? I > thought I read that it does do this at some point, but I'm unable to > find that reference at this moment... > > Wes
