[ https://issues.apache.org/jira/browse/ZOOKEEPER-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927991#comment-17927991 ]
Chevaris commented on ZOOKEEPER-571: ------------------------------------ Even a much more simpler brute force approach is just to offer a public API to allow clients to close the socket (Pretty much the same what Zookeeper.getTestable().closeSocket() is doing) Clients simply need to periodically call the API (e.g. every 30 mins). Just add a bit of dispersion on when first disconnection happens!!! In any situation in which client connections are unbalanced (e.g. a ZK server has crashed and started after that), the solution will converge to achieve a good balance. Does this make sense? Solution is obviously requires some time to converge, BUT only needed is to make public closeSocket() method in Zookeeper class. Could we start at least making this method public as a piece for probably more advance solution? > support balancing of client load across servers in an ensemble > -------------------------------------------------------------- > > Key: ZOOKEEPER-571 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-571 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum, server > Reporter: Patrick D. Hunt > Priority: Major > > Currently the ensemble does not ensure a balanced load across servers in an > ensemble. Clients randomly connect to > a server, which typically balances the number of sessions. However there are > problems with this: > 1) session count is balanced, but not session load > 2) if server A goes down all of the sessions on that server migrate to other > servers in the cluster randomly, this is fine, however > when server A comes back into service it will have no sessions, and migration > of sessions from other servers may take time > The quorum should probably have some way of broadcasting load, and > occasionally re-balance the sessions based on > this information. Might be tricky though, want to ensure that we aren't > constantly ping-ponging sessions to servers. > Probably need some hysteresis as well as limit the frequency. Real time > tuning would need to be supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)