[ 
https://issues.apache.org/jira/browse/SLING-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126058#comment-15126058
 ] 

Stefan Egli commented on SLING-5435:
------------------------------------

[~cziegeler], re
bq. It might happen that the old leader did a change, which the new leader 
doesn't see yet and therefore the new leader might try a similar operation
That was the original thinking behind doing the sync, yes.

A certain class of TopologyEventListeners (such as for example job handling) 
can be written in a way that is resilient to repository delays. Namely by 
forcing a 'repository sync' for a particular node by making a change that 
forces a conflict even if old leader changes are delayed.

However, I think that this is not possible for all cases - ignoring the fact 
that when writing a TopologyEventListener you might not be aware of these 
subleties. 

Additionally though, there might be 'derivative cases' - such as for example 
the Sling Scheduler - which only check the 'leader' flag to then behave 
leader-like or not. What that 'leader-like' implies is unclear and can perhaps 
not always be 'guarded' via a repository-synched flag (which tries to result in 
a conflict in these delay-cases).

I think it helps the default implementor of TopologyEventListener writing 
leader-failover-stable code when it receives the TOPOLOGY_CHANGED only after a 
repo-sync. Leaving this to each and every listener implementor might result in 
duplication of code as well as less stable code..

[~marett], re
bq. For those use cases, with the given discovery API and implementations, it 
is already possible to avoid the delay by polling the DiscoveryService instead 
of handling the TopologyEventListener events. However, I believe it is a rather 
discouraged thing.

I disagree. You cannot poll DiscoveryService to get {{isCurrent}} true *before* 
the TopologyEventListeners get informed. In fact, these two things are coupled: 
{{TopologyView.isCurrent()}} and the {{TOPOLOGY_CHANGED}} event are 
synchronized. That is, {{isCurrent()}} only returns true once discovery starts 
sending out {{TOPOLOGY_CHANGED}} events. 

bq. LEADER_CHANGED
Some comments that come to mind:
* Alternative suggestion for the name: {{TOPOLOGY_CHANGED_UNSYNCHED}}
* If we introduce this event, it would have to be kept backwards-compatible, ie 
after this event you'd also must get a {{TOPOLOGY_CHANGED}} event.
* I still see the risk of breaking client code with this, as some might do 
things like "{{if (event.getType() != TOPOLOGY_CHANGING){}}}" or similar - in 
which case the new event type might result in execution where that was not 
intended..
* We should think about how we could keep the API symmetric: currently there 
are two fully equivalent variants: polling via 
{{DiscoveryService.getTopology()}} or push via {{TopologyEventListener}}. The 
new event is so far only available via push.
* If we do it the other way, by making this a property of 
{{TopologyEventListener}} ("{{unsynchronized=true}}"), we make it more 
backwards compatible, as only those listeners are affected that set this 
property. However it would still result in an asymmetric API and I think we 
should do something about it in both cases.
* Perhaps one possibility for providing this info in the poll variant could be: 
{{getUnsynchronizedTopology()}}. (sounds somewhat ugly though..)

Overall I tend to think that going via a listener property introduces less 
friction, but a new event type would certainly also be possible...

> Decouple processes that depend on cluster leader elections from the cluster 
> leader elections.
> ---------------------------------------------------------------------------------------------
>
>                 Key: SLING-5435
>                 URL: https://issues.apache.org/jira/browse/SLING-5435
>             Project: Sling
>          Issue Type: Improvement
>          Components: General
>            Reporter: Ian Boston
>
> Currently there are many processes in Sling that must complete before a Sling 
> Discovery cluster leader election is declared complete. These processes 
> include things like transferring all Jobs from the old leader to the new 
> leader and waiting for the data to appear visible on the new leader. This 
> introduces an additional overhead to the leader election process which 
> introduces a higher than desirable timeout for elections and heartbeat. This 
> higher than desirable timeout precludes the use of more efficient election 
> and distributed consensus algorithms as implemented in Etcd, Zookeeper or 
> implementations of RAFT.
> If the election could be declared complete leaving individual components to 
> manage their own post election operations (ie decoupling those processes from 
> the election), then faster election or alternative Discovery implementations 
> such as the one implemented on etcd could be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to