sunhelly commented on pull request #3260: URL: https://github.com/apache/hbase/pull/3260#issuecomment-840337094
> I am not sure if this Jira is needed. First of all, cost function should be implemented independently for tuning consideration. if this function is not needed, operator can turn it off by setting the weight to 0. > Secondly, as I understand, the example would cause a problem no matter how many tables we have. I have another Jira to improve this cost function. https://issues.apache.org/jira/browse/HBASE-25739 which completely rewrite the implementation and fix the problem completely. Hi, @clarax , thanks for attention. I think the TableSkewCostFunction can only set the regions be roughly balanced instead of fully balanced from the perspective of one table. For example, if there are two tables on the cluster, the whole cluster distribution is [10,10,10,10], one table distribution is [10,0,10,0], while another table is [0,10,0,10], then the cost of TableSkewCostFunction is 10/30. But when one table distribution is [2,8,2,8], another is [8,2,8,2], then the cost of TableSkewCostFunction is 6/30, smaller than the previous value, and lower the possibility of generate actions. Even generate actions afterwards, how the balancer knows which table is the most unbalanced? Balancing by table can ensure every table regions be distributed evenly. Hope I can fully get your ideas, I'm a little confused about some problems, can you tell me if you have used balance by table in your clusters, and how many tables whose region count is smaller than the online RS count? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
