[
https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------
Attachment: loadbalance2.0.patch
loadbalance2.0.patch is for my mega cool low-centralised load balance
algorithm...
but it is prototype yet... just to show my new ideas :)
and it's independent from other patches here
What was idea:
* Region Servers knows better what what regions to unassignee ... and can make
own decisions about it.
* For such decisions HRS will use LoadBalancer thread
* To make such decisions HRS need to know current load situation in cluster
(LoadMetrics)
* HRS reading LoadMetrics record from ZK
* If HRS can't get LoadMetrics record, it makes LoadBalance Slip
* If HRS founds out that is is overloaded it closes some Regions
* Master can update and put in ZK new LoadMetrics record with some frequency
* LoadMetrics record contains: avgLoad, maxLoad, upLoadBound, lowLoadBound,
uderloadinFactor
* LoadMetrics is a class with that attributes and can be serialised to bytes
and read from bytes
* LoadMetrics record is a data of some special Ephemeral zNode in ZK, created
by Master
* Master still assigning closed regions to HRS, so balance if half-centralised
(unnasigne is distributed and assignee is centralised)
* in future master wil use a flag in LoadMetrics to stop unassigning if there
too much closed Regions
> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
> Key: HBASE-1017
> URL: https://issues.apache.org/jira/browse/HBASE-1017
> Project: Hadoop HBase
> Issue Type: Improvement
> Affects Versions: 0.19.0
> Reporter: Jonathan Gray
> Assignee: Evgeny Ryabitskiy
> Priority: Minor
> Fix For: 0.20.0
>
> Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch,
> HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch,
> HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch,
> HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes. With about 215 total
> regions, each of the 9 had around 24 regions (average load is 24). Slop is
> 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Going to close region streamitems,^...@^@^...@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Going to close region streamitems,^...@^@^...@^@^...@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Going to close region streamitems,^...@^@^...@^@^...@3^z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Going to close region streamitems,^...@^@^...@^@^@ ^L,1225411049042
> {code}
> The new regionserver received only 6 regions. This happened because when the
> 10th came in, average load dropped to 22. This caused two servers with 25
> regions (acceptable when avg was 24 but not now) to reassign 3 of their
> regions each to bring them back down to the average. Unfortunately all other
> regions remained within the 10% slop (20 to 24) so they were not overloaded
> and thus did not reassign off any regions. It was only chance that made even
> 6 of the regions get reassigned as there could have been exactly 24 on each
> server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little
> impact on the avg load/server.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.