[ 
https://issues.apache.org/jira/browse/HBASE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070314#comment-14070314
 ] 

Jean-Marc Spaggiari commented on HBASE-11562:
---------------------------------------------

{quote}
This sounds like a useful detail. Make it the default behavior?
{quote}
Thie wil change the default behaviour.  That's why I prefered to keep that 
false by default to keep current behaviour but allow it. Maybe we can turn that 
on by default in 0.99 and false by default on the others?

{quote}
In practice, shouldn't a well balanced table have fairly random region -> 
RegionServer distribution? 
{quote}
Yes it should. On a 100 nodes clusters for a 3000 regions table, when you 
pickup the first 300 regions you mot probably will have about 3 regions for 
each server. So all the 100 servers will send a lot of puts. Now, if 
destination is a 10 regions table, all those 300 sources will most probably go 
into the same region, so the same region server. So issue is there more than on 
the source table distribution.

{quote}
You see logically adjacent regions piling up on the same RS?
{quote}

 Not necessary, but since we balance per cluster and not per table, you still 
have the odds to get 2 regions from the 10 first ending to the same region 
server, which will make things even worst.

I now need to figure why Jenkins don't like my patchs. Will re-submit later 
today.

> CopyTable should provide an option to shuffle the mapper tasks
> --------------------------------------------------------------
>
>                 Key: HBASE-11562
>                 URL: https://issues.apache.org/jira/browse/HBASE-11562
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.99.0, 0.94.20, 0.98.4
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Jean-Marc Spaggiari
>         Attachments: HBASE-11562-v0-trunk.patch, HBASE-11562-v1-trunk.patch
>
>
> When doing a copy table from a table with a lot of regions to a table to way 
> less regions, on a cluster with limited number of mappers, since map tasks 
> are ordered by key, tasks will first run for the few first regions and will 
> hotspot a single region server on the destination side.
> To avoid this, we should submit the map tasks in a random order.
> This JIRA is to add this option to CopyTable and TableInputFormat



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to