[ 
https://issues.apache.org/jira/browse/HBASE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613609#action_12613609
 ] 

Billy Pearson commented on HBASE-675:
-------------------------------------

sense the load on MR jobs is on the Region server not the datanode then moveing 
the task close to the region server would be idea. 

the problem with above is the client/mr task would have to talk to the region 
server over the network for a request 
then the region server would have to talk back to the data node with the data
and then back to the map task with the returned data.
thats 4 hops for each request.

If the task would run on the same region server as the region was hosted then 
we would 
just have the region server to the data node hops speeding up each request

Assuming average setup is going to be per server
hadoop datanode
hbase region server
X mapper tasks
x Reducer tasks

Then havening the local task work on the local region server would also help 
spread the load and help not overloading one region server at once.


> Report correct server hosting a table split for assignment to for MR Jobs
> -------------------------------------------------------------------------
>
>                 Key: HBASE-675
>                 URL: https://issues.apache.org/jira/browse/HBASE-675
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Billy Pearson
>            Priority: Minor
>             Fix For: 0.3.0
>
>
> Currently we return a null String array to the MR framework to use a random 
> node for MR job assignment.
> class: org.apache.hadoop.hbase.mapred.tableSplit
> function getLocations()
> We should be able to query the meta now for the current host name of the 
> server hosting the region in question.
> This will help with scaling as there will be less cross server communication 
> removing bandwidth as a bottleneck.
> The side effect of fixing this will help from overloading region servers with 
> lots of MR clients all pulling from the same region server while theres work 
> local for them to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to