[ 
https://issues.apache.org/jira/browse/HBASE-25973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-25973:
--------------------------------
    Description: 
In the log, balancer logs at info level at the beginning of run:

 {code}
balancer.StochasticLoadBalancer: start StochasticLoadBalancer.balancer, 
initCost=277.3479243125063, functionCost=RegionCountSkewCostFunction : (500.0, 
0.3749771215224234); ServerLocalityCostFunction : (25.0, 0.5807483226644186); 
RackLocalityCostFunction : (15.0, 0.0); TableSkewCostFunction : (1000.0, 
0.0019704142954972883); StoreFileCostFunction : (200.0, 0.3668512059459341);  
computedMaxSteps: 42270438200
{code}
the cost is reported without context, it is hard for operator to understand how 
unbalanced the cluster is for balancer and how much progress we are making.

For a large cluster, the calculation can take a long time, we also need to let 
operator understand that it will take up to the max time to complete the 
calculation. 

At the end of computation:
{code}
balancer.StochasticLoadBalancer: Finished computing new load balance plan. 
Computation took PT40M0.006S to try 1036409 different iterations. Found a 
solution that moves 161926 regions; Going from a computed cost of 
118.75715593924485 to a new cost of 1.5509126920967042
{code}
The time to compute the plan is also printed in a  format that is not human 
readable. we also need to let operator understand that balancer is just 
submitting the plan and it be up to execution to complete the move.  

 

  was:
In the log, balancer logs at info level at the beginning of run:

 
balancer.StochasticLoadBalancer: start StochasticLoadBalancer.balancer, 
initCost=277.3479243125063, functionCost=RegionCountSkewCostFunction : (500.0, 
0.3749771215224234); ServerLocalityCostFunction : (25.0, 0.5807483226644186); 
RackLocalityCostFunction : (15.0, 0.0); TableSkewCostFunction : (1000.0, 
0.0019704142954972883); StoreFileCostFunction : (200.0, 0.3668512059459341);  
computedMaxSteps: 42270438200
the cost is reported without context, it is hard for operator to understand how 
unbalanced the cluster is for balancer and how much progress we are making.

For a large cluster, the calculation can take a long time, we also need to let 
operator understand that it will take up to the max time to complete the 
calculation. 

At the end of computation:

balancer.StochasticLoadBalancer: Finished computing new load balance plan. 
Computation took PT40M0.006S to try 1036409 different iterations. Found a 
solution that moves 161926 regions; Going from a computed cost of 
118.75715593924485 to a new cost of 1.5509126920967042

The time to compute the plan is also printed in a  format that is not human 
readable. we also need to let operator understand that balancer is just 
submitting the plan and it be up to execution to complete the move.  

 


> Balancer should explain progress in a better way in log
> -------------------------------------------------------
>
>                 Key: HBASE-25973
>                 URL: https://issues.apache.org/jira/browse/HBASE-25973
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>
> In the log, balancer logs at info level at the beginning of run:
>  {code}
> balancer.StochasticLoadBalancer: start StochasticLoadBalancer.balancer, 
> initCost=277.3479243125063, functionCost=RegionCountSkewCostFunction : 
> (500.0, 0.3749771215224234); ServerLocalityCostFunction : (25.0, 
> 0.5807483226644186); RackLocalityCostFunction : (15.0, 0.0); 
> TableSkewCostFunction : (1000.0, 0.0019704142954972883); 
> StoreFileCostFunction : (200.0, 0.3668512059459341);  computedMaxSteps: 
> 42270438200
> {code}
> the cost is reported without context, it is hard for operator to understand 
> how unbalanced the cluster is for balancer and how much progress we are 
> making.
> For a large cluster, the calculation can take a long time, we also need to 
> let operator understand that it will take up to the max time to complete the 
> calculation. 
> At the end of computation:
> {code}
> balancer.StochasticLoadBalancer: Finished computing new load balance plan. 
> Computation took PT40M0.006S to try 1036409 different iterations. Found a 
> solution that moves 161926 regions; Going from a computed cost of 
> 118.75715593924485 to a new cost of 1.5509126920967042
> {code}
> The time to compute the plan is also printed in a  format that is not human 
> readable. we also need to let operator understand that balancer is just 
> submitting the plan and it be up to execution to complete the move.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to