Following on from Cosmin's comment, the bit where hbase tells the MR
framework regionserver addresses so it assigns tasks to the taskstracker
running beside the hosting regionserver was committed recently: "HBASE-675
Report correct server hosting a table split for assignment to for MR Jobs"
In coarse tests, network traffic is halved.

The piece where hbase assignment of regions takes into consideration the
data locality so regions are assigned to the regionserver that is running on
the same node as the datanode hosting the regions' data, is yet to be done.

St.Ack

On Wed, Nov 26, 2008 at 3:36 AM, Cosmin Lehene <[EMAIL PROTECTED]> wrote:

> It doesn't currently do that. However this seems to be on HBase roadmap.
> See Data-Locality Awareness
>
> The Hadoop map reduce framework does makes a best effort at running tasks
> on the server hosting the task data after the dictum that its cheaper moving
> the processing to the data rather than the inverse. HBase needs smarts to
> assign regions to the region server that is running on the server hosting
> the regions' data. HBase needs to supply map reduce hints such that the
> Hadoop framework runs tasks beside the region server hosting the task input.
> These changes will make for savings in network I/O.
>
> http://wiki.apache.org/hadoop/HBase/RoadMaps
>
> Regards,
> Cosmin
>
> On 11/26/08 1:32 PM, "David Faitelson" <[EMAIL PROTECTED]>
> wrote:
>
> Hi,
>
> Does HBase/Hadoop create map tasks on the same data node that
> contains the region for the map task?
>
> I know that Bigtable does something like that but I could not find
> any mention of this optimization in the documentation of HBase.
>
> Thanks,
> David
>
>

Reply via email to