[ 
https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699325#action_12699325
 ] 

Evgeny Ryabitskiy commented on HBASE-1017:
------------------------------------------

About refactoring

Server manager has mapping: 
 - serverName 2 serverInfo,
 - serverAddr 2 serverInfo,
 - serverName 2 load, 
 - load 2 severName

1) serverName 2 load - not necessary if you have  serverName 2 serverInfo
2) All mappings are encapsulated in ServersInfo class (inner class of 
ServerManager)
3) ServersInfo has operations for adding, updating and removing information of 
HRS


About Load Balance Algorithm

Previous check: If HRS load more then avg Load Plus Slop, the HRS is 
overloaded, close some regions (numToClose = currentRegions - avgLoad)

Added check: If HRS is most loaded and lowest loaded HRS are loaded less then 
avgLoadMinusSlop then close some regions from most loaded (numToClose = 
min(currentRegions - avgLoad, (avgLoadMinusSlop - lowestLoad) * 
numLowestLoadedHRS)  )



Changes to JUnit for Region Balance:

Assert check if loads of all HRS are in slop range after rebalnce.

Number of HRS upped to 10 from 4.

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, 
> HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, 
> HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total 
> regions, each of the 9 had around 24 regions (average load is 24).  Slop is 
> 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Going to close region streamitems,^...@^@^...@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Going to close region streamitems,^...@^@^...@^@^...@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Going to close region streamitems,^...@^@^...@^@^...@3^z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> Going to close region streamitems,^...@^@^...@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 
> 10th came in, average load dropped to 22.  This caused two servers with 25 
> regions (acceptable when avg was 24 but not now) to reassign 3 of their 
> regions each to bring them back down to the average.  Unfortunately all other 
> regions remained within the 10% slop (20 to 24) so they were not overloaded 
> and thus did not reassign off any regions.  It was only chance that made even 
> 6 of the regions get reassigned as there could have been exactly 24 on each 
> server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little 
> impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to