[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436606#comment-13436606
 ] 

Aniruddha commented on BOOKKEEPER-363:
--------------------------------------

Couple of issues I found while implementing this. 
1) Hedwig hubs only update the load while claiming topics. We should also 
update the load while releasing topics. 
2) HubLoad is not thread safe and faces race conditions while handling (1). 
Instead of providing a setNumTopics(), we should have incrementNumTopics() and 
decrementNumTopics() and update the load periodically (perhaps as a side effect 
of a successful rebalance) 

(1) also affects the case where a new hub joins a balanced cluster while new 
topic ownership requests are coming in parallel. Every hub will choose this new 
hub as the least loaded node and it will get ownership of a lot of topics and 
this would increase it's reported load to a large value and thus this node 
would never claim any more topics. Or perhaps I'm missing something. 

To give a high level overview of the implementation, we introduce a 
rebalanceCluster() function in the HubServerManager interface. This takes in a 
tolerance percentage and the maximum load to shed per call (to make sure you 
don't suddenly release a lot of topics) We also add a new class called 
TopicBasedLoadShedder that sheds load by releasing topics. It calculates the 
average load on the cluster from the reported zookeeper load data, calculates 
if the topics the current hub owns is more than average + average*tolerance 
percentage/100 and if so, releases enough topics to reach average. Any feedback 
would be highly appreciated.
                
> Re-distributing topics among newly added hubs.
> ----------------------------------------------
>
>                 Key: BOOKKEEPER-363
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-363
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>            Reporter: Aniruddha
>
> When a new hub is added to an already existing hedwig cluster, that hub 
> should pick up some of the topics. Currently the mechanism hedwig provides is 
> to configure the time for which a topic is retained. A better approach might 
> be to run a re-balancer thread that periodically checks if topics are 
> distributed evenly among hubs and if not, releases some topics to balance the 
> load. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to