[
https://issues.apache.org/jira/browse/BOOKKEEPER-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436606#comment-13436606
]
Aniruddha commented on BOOKKEEPER-363:
--------------------------------------
Couple of issues I found while implementing this.
1) Hedwig hubs only update the load while claiming topics. We should also
update the load while releasing topics.
2) HubLoad is not thread safe and faces race conditions while handling (1).
Instead of providing a setNumTopics(), we should have incrementNumTopics() and
decrementNumTopics() and update the load periodically (perhaps as a side effect
of a successful rebalance)
(1) also affects the case where a new hub joins a balanced cluster while new
topic ownership requests are coming in parallel. Every hub will choose this new
hub as the least loaded node and it will get ownership of a lot of topics and
this would increase it's reported load to a large value and thus this node
would never claim any more topics. Or perhaps I'm missing something.
To give a high level overview of the implementation, we introduce a
rebalanceCluster() function in the HubServerManager interface. This takes in a
tolerance percentage and the maximum load to shed per call (to make sure you
don't suddenly release a lot of topics) We also add a new class called
TopicBasedLoadShedder that sheds load by releasing topics. It calculates the
average load on the cluster from the reported zookeeper load data, calculates
if the topics the current hub owns is more than average + average*tolerance
percentage/100 and if so, releases enough topics to reach average. Any feedback
would be highly appreciated.
> Re-distributing topics among newly added hubs.
> ----------------------------------------------
>
> Key: BOOKKEEPER-363
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-363
> Project: Bookkeeper
> Issue Type: Bug
> Components: hedwig-server
> Reporter: Aniruddha
>
> When a new hub is added to an already existing hedwig cluster, that hub
> should pick up some of the topics. Currently the mechanism hedwig provides is
> to configure the time for which a topic is retained. A better approach might
> be to run a re-balancer thread that periodically checks if topics are
> distributed evenly among hubs and if not, releases some topics to balance the
> load.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira