[ https://issues.apache.org/jira/browse/SLING-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126058#comment-15126058 ]
Stefan Egli commented on SLING-5435: ------------------------------------ [~cziegeler], re bq. It might happen that the old leader did a change, which the new leader doesn't see yet and therefore the new leader might try a similar operation That was the original thinking behind doing the sync, yes. A certain class of TopologyEventListeners (such as for example job handling) can be written in a way that is resilient to repository delays. Namely by forcing a 'repository sync' for a particular node by making a change that forces a conflict even if old leader changes are delayed. However, I think that this is not possible for all cases - ignoring the fact that when writing a TopologyEventListener you might not be aware of these subleties. Additionally though, there might be 'derivative cases' - such as for example the Sling Scheduler - which only check the 'leader' flag to then behave leader-like or not. What that 'leader-like' implies is unclear and can perhaps not always be 'guarded' via a repository-synched flag (which tries to result in a conflict in these delay-cases). I think it helps the default implementor of TopologyEventListener writing leader-failover-stable code when it receives the TOPOLOGY_CHANGED only after a repo-sync. Leaving this to each and every listener implementor might result in duplication of code as well as less stable code.. [~marett], re bq. For those use cases, with the given discovery API and implementations, it is already possible to avoid the delay by polling the DiscoveryService instead of handling the TopologyEventListener events. However, I believe it is a rather discouraged thing. I disagree. You cannot poll DiscoveryService to get {{isCurrent}} true *before* the TopologyEventListeners get informed. In fact, these two things are coupled: {{TopologyView.isCurrent()}} and the {{TOPOLOGY_CHANGED}} event are synchronized. That is, {{isCurrent()}} only returns true once discovery starts sending out {{TOPOLOGY_CHANGED}} events. bq. LEADER_CHANGED Some comments that come to mind: * Alternative suggestion for the name: {{TOPOLOGY_CHANGED_UNSYNCHED}} * If we introduce this event, it would have to be kept backwards-compatible, ie after this event you'd also must get a {{TOPOLOGY_CHANGED}} event. * I still see the risk of breaking client code with this, as some might do things like "{{if (event.getType() != TOPOLOGY_CHANGING){}}}" or similar - in which case the new event type might result in execution where that was not intended.. * We should think about how we could keep the API symmetric: currently there are two fully equivalent variants: polling via {{DiscoveryService.getTopology()}} or push via {{TopologyEventListener}}. The new event is so far only available via push. * If we do it the other way, by making this a property of {{TopologyEventListener}} ("{{unsynchronized=true}}"), we make it more backwards compatible, as only those listeners are affected that set this property. However it would still result in an asymmetric API and I think we should do something about it in both cases. * Perhaps one possibility for providing this info in the poll variant could be: {{getUnsynchronizedTopology()}}. (sounds somewhat ugly though..) Overall I tend to think that going via a listener property introduces less friction, but a new event type would certainly also be possible... > Decouple processes that depend on cluster leader elections from the cluster > leader elections. > --------------------------------------------------------------------------------------------- > > Key: SLING-5435 > URL: https://issues.apache.org/jira/browse/SLING-5435 > Project: Sling > Issue Type: Improvement > Components: General > Reporter: Ian Boston > > Currently there are many processes in Sling that must complete before a Sling > Discovery cluster leader election is declared complete. These processes > include things like transferring all Jobs from the old leader to the new > leader and waiting for the data to appear visible on the new leader. This > introduces an additional overhead to the leader election process which > introduces a higher than desirable timeout for elections and heartbeat. This > higher than desirable timeout precludes the use of more efficient election > and distributed consensus algorithms as implemented in Etcd, Zookeeper or > implementations of RAFT. > If the election could be declared complete leaving individual components to > manage their own post election operations (ie decoupling those processes from > the election), then faster election or alternative Discovery implementations > such as the one implemented on etcd could be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)