[ https://issues.apache.org/jira/browse/SLING-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123504#comment-15123504 ]
Timothee Maret commented on SLING-5435: --------------------------------------- bq. If those "processes" don't exist, then it sounds like there is nothing to stop a faster leader election implementation that is not slowed down by the latency required to ensure a repository reaches a consistent state. As I wrote in my previous comment, I think that all consumers of the {{TopologyEventListener}} currently in Sling do make a legit case of waiting on the repository. However, my point is that not all {{TopologyEventListener}} need to wait on the repository. I have shared a list of use cases as well. Thus, instead of imposing the repository wait, we could allow to configure a listener so that it does not have to wait on some repository sync. The implementation proposal has been discussed above as well. I propose to keep the focus of this issue on that goal. Alternatively, I would open a separate one. bq. If that's really the case, then this issue can be closed and replaced by a new issue titled something like "Implement leader election using RAFT over the network" in the same way that systems like etcd perform elections. I have already opened SLING-5423 to track that and I am working on it. > Decouple processes that depend on cluster leader elections from the cluster > leader elections. > --------------------------------------------------------------------------------------------- > > Key: SLING-5435 > URL: https://issues.apache.org/jira/browse/SLING-5435 > Project: Sling > Issue Type: Improvement > Components: General > Reporter: Ian Boston > > Currently there are many processes in Sling that must complete before a Sling > Discovery cluster leader election is declared complete. These processes > include things like transferring all Jobs from the old leader to the new > leader and waiting for the data to appear visible on the new leader. This > introduces an additional overhead to the leader election process which > introduces a higher than desirable timeout for elections and heartbeat. This > higher than desirable timeout precludes the use of more efficient election > and distributed consensus algorithms as implemented in Etcd, Zookeeper or > implementations of RAFT. > If the election could be declared complete leaving individual components to > manage their own post election operations (ie decoupling those processes from > the election), then faster election or alternative Discovery implementations > such as the one implemented on etcd could be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)