[ 
https://issues.apache.org/jira/browse/HBASE-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15207:
--------------------------
    Description: 
Balancer seems to have gotten stuck in 1.2.0RC1 soon after Master joins running 
cluster (previous Master had been killed by chaos monkey). Investigate. At 
least fix the crazy logging which made me notice the stuck balancer.

Last night my logs filled with this (10x256MB log files):

....
2016-02-01 11:25:26,958 DEBUG [B.defaultRpcServer.handler=9,queue=0,port=16000] 
balancer.BaseLoadBalancer: Lowest locality region server with non zero regions 
is ve0542.halxg.cloudera.com with locality 0.0
2016-02-01 11:25:26,958 DEBUG [B.defaultRpcServer.handler=9,queue=0,port=16000] 
balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
server contains 1 regions
...


Added by this:

commit 54028140f4f19a6af81c8c8f29dda0c52491a0c9
Author: tedyu <[email protected]>
Date:   Thu Aug 13 09:11:59 2015 -0700

    HBASE-13376 Improvements to Stochastic load balancer (Vandana 
Ayyalasomayajula)

Looks like balancer got stuck. Logging at ten lines a millisecond.

Here is lead up. Nothing in particular jumps out. Rerun doesn't show this.

{code}
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Lowest locality region server with non zero regions 
is ve0540.halxg.cloudera.com with locality 0.0
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
server contains 1 regions
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Lowest locality region server with non zero regions 
is ve0540.halxg.cloudera.com with locality 0.0
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
server contains 1 regions
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
....
{code}

Nothing else is happening on this master

Happens just after a Master joins cluster after being killed by a monkey.

  was:
Last night my logs filled with this (10x256MB log files):

....
2016-02-01 11:25:26,958 DEBUG [B.defaultRpcServer.handler=9,queue=0,port=16000] 
balancer.BaseLoadBalancer: Lowest locality region server with non zero regions 
is ve0542.halxg.cloudera.com with locality 0.0
2016-02-01 11:25:26,958 DEBUG [B.defaultRpcServer.handler=9,queue=0,port=16000] 
balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
server contains 1 regions
...


Added by this:

commit 54028140f4f19a6af81c8c8f29dda0c52491a0c9
Author: tedyu <[email protected]>
Date:   Thu Aug 13 09:11:59 2015 -0700

    HBASE-13376 Improvements to Stochastic load balancer (Vandana 
Ayyalasomayajula)

Looks like balancer got stuck. Logging at ten lines a millisecond.

Here is lead up. Nothing in particular jumps out. Rerun doesn't show this.

{code}
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Lowest locality region server with non zero regions 
is ve0540.halxg.cloudera.com with locality 0.0
2016-01-28 05:56:22,572 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
server contains 1 regions
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Lowest locality region server with non zero regions 
is ve0540.halxg.cloudera.com with locality 0.0
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
server contains 1 regions
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
2016-01-28 05:56:22,573 DEBUG 
[ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
....
{code}

Nothing else is happening on this master

Happens just after a Master joins cluster after being killed by a monkey.


> Stuck balancer
> --------------
>
>                 Key: HBASE-15207
>                 URL: https://issues.apache.org/jira/browse/HBASE-15207
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>    Affects Versions: 1.2.0
>            Reporter: stack
>
> Balancer seems to have gotten stuck in 1.2.0RC1 soon after Master joins 
> running cluster (previous Master had been killed by chaos monkey). 
> Investigate. At least fix the crazy logging which made me notice the stuck 
> balancer.
> Last night my logs filled with this (10x256MB log files):
> ....
> 2016-02-01 11:25:26,958 DEBUG 
> [B.defaultRpcServer.handler=9,queue=0,port=16000] balancer.BaseLoadBalancer: 
> Lowest locality region server with non zero regions is 
> ve0542.halxg.cloudera.com with locality 0.0
> 2016-02-01 11:25:26,958 DEBUG 
> [B.defaultRpcServer.handler=9,queue=0,port=16000] balancer.BaseLoadBalancer:  
> Lowest locality region index is 0 and its region server contains 1 regions
> ...
> Added by this:
> commit 54028140f4f19a6af81c8c8f29dda0c52491a0c9
> Author: tedyu <[email protected]>
> Date:   Thu Aug 13 09:11:59 2015 -0700
>     HBASE-13376 Improvements to Stochastic load balancer (Vandana 
> Ayyalasomayajula)
> Looks like balancer got stuck. Logging at ten lines a millisecond.
> Here is lead up. Nothing in particular jumps out. Rerun doesn't show this.
> {code}
> 2016-01-28 05:56:22,572 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,572 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,572 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,572 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Lowest locality region server with non zero 
> regions is ve0540.halxg.cloudera.com with locality 0.0
> 2016-01-28 05:56:22,572 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
> server contains 1 regions
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Lowest locality region server with non zero 
> regions is ve0540.halxg.cloudera.com with locality 0.0
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer:  Lowest locality region index is 0 and its region 
> server contains 1 regions
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0526.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0532.halxg.cloudera.com had 0 regions.
> 2016-01-28 05:56:22,573 DEBUG 
> [ve0524.halxg.cloudera.com,16000,1453988766013_ChoreService_1] 
> balancer.BaseLoadBalancer: Server ve0538.halxg.cloudera.com had 0 regions.
> ....
> {code}
> Nothing else is happening on this master
> Happens just after a Master joins cluster after being killed by a monkey.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to