[
https://issues.apache.org/jira/browse/HBASE-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129933#comment-15129933
]
stack commented on HBASE-15207:
-------------------------------
Happened again in new loading. Its hard to make sense of because the logging
overwhelms. Let me address that in a patch first. Can then look at hang. In
this current case we filled 10 log files of 256MB each... but it looks like we
'recovered'. LB reports:
2016-02-02 17:15:05,123 DEBUG
[ve0524.halxg.cloudera.com,16000,1454461776827_ChoreService_1]
balancer.StochasticLoadBalancer: Finished computing new load balance plan.
Computation took 3761ms to try 217600 different iterations. Found a solution
that moves 31 regions; Going from a computed cost of 341.5035724035313 to a new
cost of 47.8518130393211
And going back over logs, yeah, I get sessions of log spewing.... maybe the
load balancer is ok... its just this crazy logging phenomenon
> Stuck balancer
> --------------
>
> Key: HBASE-15207
> URL: https://issues.apache.org/jira/browse/HBASE-15207
> Project: HBase
> Issue Type: Bug
> Components: Balancer
> Affects Versions: 1.2.0
> Reporter: stack
>
> Balancer seems to have gotten stuck in 1.2.0RC1 soon after Master joins
> running cluster (previous Master had been killed by chaos monkey).
> Investigate. At least fix the crazy logging which made me notice the stuck
> balancer.
> Last night my logs filled with this (10x256MB log files):
> ....
> 2016-02-01 11:25:26,958 DEBUG
> [B.defaultRpcServer.handler=9,queue=0,port=16000] balancer.BaseLoadBalancer:
> Lowest locality region server with non zero regions is
> ve0542.halxg.cloudera.com with locality 0.0
> 2016-02-01 11:25:26,958 DEBUG
> [B.defaultRpcServer.handler=9,queue=0,port=16000] balancer.BaseLoadBalancer:
> Lowest locality region index is 0 and its region server contains 1 regions
> ...
> Added by this:
> commit 54028140f4f19a6af81c8c8f29dda0c52491a0c9
> Author: tedyu <[email protected]>
> Date: Thu Aug 13 09:11:59 2015 -0700
> HBASE-13376 Improvements to Stochastic load balancer (Vandana
> Ayyalasomayajula)
> Looks like balancer got stuck. Logging at ten lines a millisecond.
> Here is lead up. Nothing in particular jumps out. Rerun doesn't show this.
> {code}
> 2016-01-28 05:56:22,572 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,572 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,572 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,572 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Lowest locality region server with non zero
> regions is ve0540.halxg.cloudera.com with locality 0.0
> 2016-01-28 05:56:22,572 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Lowest locality region index is 0 and its region
> server contains 1 regions
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Lowest locality region server with non zero
> regions is ve0540.halxg.cloudera.com with locality 0.0
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Lowest locality region index is 0 and its region
> server contains 1 regions
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1]
> balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
> ....
> {code}
> Nothing else is happening on this master
> Happens just after a Master joins cluster after being killed by a monkey.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)