[
https://issues.apache.org/jira/browse/HBASE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070314#comment-14070314
]
Jean-Marc Spaggiari commented on HBASE-11562:
---------------------------------------------
{quote}
This sounds like a useful detail. Make it the default behavior?
{quote}
Thie wil change the default behaviour. That's why I prefered to keep that
false by default to keep current behaviour but allow it. Maybe we can turn that
on by default in 0.99 and false by default on the others?
{quote}
In practice, shouldn't a well balanced table have fairly random region ->
RegionServer distribution?
{quote}
Yes it should. On a 100 nodes clusters for a 3000 regions table, when you
pickup the first 300 regions you mot probably will have about 3 regions for
each server. So all the 100 servers will send a lot of puts. Now, if
destination is a 10 regions table, all those 300 sources will most probably go
into the same region, so the same region server. So issue is there more than on
the source table distribution.
{quote}
You see logically adjacent regions piling up on the same RS?
{quote}
Not necessary, but since we balance per cluster and not per table, you still
have the odds to get 2 regions from the 10 first ending to the same region
server, which will make things even worst.
I now need to figure why Jenkins don't like my patchs. Will re-submit later
today.
> CopyTable should provide an option to shuffle the mapper tasks
> --------------------------------------------------------------
>
> Key: HBASE-11562
> URL: https://issues.apache.org/jira/browse/HBASE-11562
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.99.0, 0.94.20, 0.98.4
> Reporter: Jean-Marc Spaggiari
> Assignee: Jean-Marc Spaggiari
> Attachments: HBASE-11562-v0-trunk.patch, HBASE-11562-v1-trunk.patch
>
>
> When doing a copy table from a table with a lot of regions to a table to way
> less regions, on a cluster with limited number of mappers, since map tasks
> are ordered by key, tasks will first run for the few first regions and will
> hotspot a single region server on the destination side.
> To avoid this, we should submit the map tasks in a random order.
> This JIRA is to add this option to CopyTable and TableInputFormat
--
This message was sent by Atlassian JIRA
(v6.2#6252)