[
https://issues.apache.org/jira/browse/SLING-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123504#comment-15123504
]
Timothee Maret edited comment on SLING-5435 at 1/29/16 2:21 PM:
----------------------------------------------------------------
bq. If those "processes" don't exist, then it sounds like there is nothing to
stop a faster leader election implementation that is not slowed down by the
latency required to ensure a repository reaches a consistent state.
As I wrote in my previous comment, I think that all consumers of the
{{TopologyEventListener}} currently in Sling do make a legit case of waiting on
the repository. However, my point is that not all {{TopologyEventListener}}
need to wait on the repository. I have shared a list of use cases as well.
Thus, instead of imposing the repository wait, we could allow to configure a
listener so that it does not have to wait on some repository sync. The
implementation proposal has been discussed above as well.
I propose to keep the focus of this issue on that goal. Alternatively, I would
open a separate one.
bq. If that's really the case, then this issue can be closed and replaced by a
new issue titled something like "Implement leader election using RAFT over the
network" in the same way that systems like etcd perform elections.
I have already opened SLING-5423 to track that and I am working on it.
Generally, it is possible to make a "too fast" discovery "slow enough" thanks
to the piece of code [~egli] mentioned earlier in this thread.
was (Author: marett):
bq. If those "processes" don't exist, then it sounds like there is nothing to
stop a faster leader election implementation that is not slowed down by the
latency required to ensure a repository reaches a consistent state.
As I wrote in my previous comment, I think that all consumers of the
{{TopologyEventListener}} currently in Sling do make a legit case of waiting on
the repository. However, my point is that not all {{TopologyEventListener}}
need to wait on the repository. I have shared a list of use cases as well.
Thus, instead of imposing the repository wait, we could allow to configure a
listener so that it does not have to wait on some repository sync. The
implementation proposal has been discussed above as well.
I propose to keep the focus of this issue on that goal. Alternatively, I would
open a separate one.
bq. If that's really the case, then this issue can be closed and replaced by a
new issue titled something like "Implement leader election using RAFT over the
network" in the same way that systems like etcd perform elections.
I have already opened SLING-5423 to track that and I am working on it.
> Decouple processes that depend on cluster leader elections from the cluster
> leader elections.
> ---------------------------------------------------------------------------------------------
>
> Key: SLING-5435
> URL: https://issues.apache.org/jira/browse/SLING-5435
> Project: Sling
> Issue Type: Improvement
> Components: General
> Reporter: Ian Boston
>
> Currently there are many processes in Sling that must complete before a Sling
> Discovery cluster leader election is declared complete. These processes
> include things like transferring all Jobs from the old leader to the new
> leader and waiting for the data to appear visible on the new leader. This
> introduces an additional overhead to the leader election process which
> introduces a higher than desirable timeout for elections and heartbeat. This
> higher than desirable timeout precludes the use of more efficient election
> and distributed consensus algorithms as implemented in Etcd, Zookeeper or
> implementations of RAFT.
> If the election could be declared complete leaving individual components to
> manage their own post election operations (ie decoupling those processes from
> the election), then faster election or alternative Discovery implementations
> such as the one implemented on etcd could be used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)