[ 
https://issues.apache.org/jira/browse/HBASE-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951430#comment-16951430
 ] 

Michael Stack commented on HBASE-23173:
---------------------------------------

Seems stuck. Cutting down server count radically I get:


{code}
2019-10-14 23:55:28,071 INFO 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished 
computing new load balance plan. Computation took PT30.001S to try 585914 
different iterations.  Found a solution that moves 600 regions; Going from a 
computed cost of 31218.140807983513 to a new cost of 5418.578319424544
2019-10-14 23:55:28,121 WARN 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: 
calculatedMaxSteps:31021600 for loadbalancer's stochastic walk is larger than 
maxSteps:30000. Hence load balancing may not work well. Setting parameter 
"hbase.master.balancer.stochastic.runMaxSteps" to true can overcome this 
issue.(This config change does not require service restart)
2019-10-14 23:55:28,122 INFO 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: start 
StochasticLoadBalancer.balancer, initCost=110149.93518418419, 
functionCost=RegionCountSkewCostFunction : (500.0, 0.12399188017775821); 
PrimaryRegionCountSkewCostFunction : (500.0, 0.12618778280542983); 
MoveCostFunction : (7.0, 0.0); ServerLocalityCostFunction : (25.0, 
0.5599124140346061); RackLocalityCostFunction : (15.0, 0.6030048912508241); 
TableSkewCostFunction : (35.0, 0.03112669881630863); 
RegionReplicaHostCostFunction : (100000.0, 1.0); RegionReplicaRackCostFunction 
: (10000.0, 1.0); ReadRequestCostFunction : (5.0, 0.0); 
WriteRequestCostFunction : (5.0, 0.0); MemStoreSizeCostFunction : (5.0, 0.0); 
StoreFileCostFunction : (5.0, 0.14260690287861114);  computedMaxSteps: 1000000
2019-10-14 23:55:28,157 ERROR org.apache.hadoop.hbase.ScheduledChore: Caught 
error
java.lang.ArrayIndexOutOfBoundsException
2019-10-14 23:55:28,169 INFO org.apache.hadoop.hbase.ScheduledChore: Chore: 
hbasemn001.sp07.siri.apple.com,16000,1571096002182-ClusterStatusChore missed 
its start time
{code}


... no stack trace this time.

> ArrayIndexOutOfBoundsException in BaseLoadBalancer$Cluster.removeRegion
> -----------------------------------------------------------------------
>
>                 Key: HBASE-23173
>                 URL: https://issues.apache.org/jira/browse/HBASE-23173
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: cdh6.3 1.2++
>            Reporter: Michael Stack
>            Priority: Major
>
> 175 nodes with 12237 regions.
> {code}
> 2019-10-14 23:45:47,823 INFO 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: start 
> StochasticLoadBalancer.balancer, initCost=110946.81226093257, 
> functionCost=RegionCountSkewCostFunction : (500.0, 0.9166818227745739); 
> PrimaryRegionCountSkewCostFunction : (500.0, 0.9166048407040664); 
> MoveCostFunction : (7.0, 0.0); ServerLocalityCostFunction : (25.0, 
> 0.5597851965798261); RackLocalityCostFunction : (15.0, 0.5811675989545179); 
> TableSkewCostFunction : (35.0, 0.08287855195593785); 
> RegionReplicaHostCostFunction : (100000.0, 1.0); 
> RegionReplicaRackCostFunction : (10000.0, 1.0); ReadRequestCostFunction : 
> (5.0, 0.0); WriteRequestCostFunction : (5.0, 0.0); MemStoreSizeCostFunction : 
> (5.0, 0.0); StoreFileCostFunction : (5.0, 0.9112071951944016);  
> computedMaxSteps: 1000000
> 2019-10-14 23:45:47,933 ERROR org.apache.hadoop.hbase.ScheduledChore: Caught 
> error
> java.lang.ArrayIndexOutOfBoundsException: Index 145 out of bounds for length 
> 145
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.removeRegion(BaseLoadBalancer.java:873)
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.doAction(BaseLoadBalancer.java:716)
>       at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:406)
>       at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:317)
>       at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1663)
>       at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1580)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:49)
>       at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to