Hi Stefan, Thanks for your insight, more input inline below.
2015-03-17 12:00 GMT+01:00 Stefan Egli <[email protected]>: > Hi Tim, > > Currently, unfortunately, only the inverse is configurable. > > Due to an edge-case in job distribution (job is started and executed on > CRX master, master crashes before slave is updated, slave becomes master, > slave executes job a second time) the suggestion is to make anything *but* > the CRX master become the discovery leader. Ok, but isn't it still prone to double execution even when leader != master ? Assuming one master, two slaves and the following scenario: master receives a job, the job replicated to slaves, one slave executes the job and commits its changes, slave crashes before the changes are replicated, the other slave picks the job and execute it again. Can we offer guarantees against double execution, unordered or missing execution without some sort of distributed locking, a way to make sure the content is replicated or some sort of centralised job dispatcher ? Anyway, AFAIU enforcing 'leader != master' would be against an active/passive setup. Indeed, if enabled, an application could either process on exactly one crx slave as in the snip below if (leader) { // process on one crx slave } or on an undefined type of instance wrt to clustering if ( ! leader) { // process on one crx slave or on the crx master // if there is only one instance we may not process it at all } > So this was an explicit goal > of the current implementation. To achieve this, a repository descriptor is > configurable (leaderElectionRepositoryDescriptor) which, when set, can > enforce exactly this: that the crx-master is not the leader if there is > any slave around. > ok, in our setup the descriptor is not set > > I fear there is no explicit way atm to force the behavior you want. About > the closest one I can think of is: the leader is defined to be stable, ie > once an instance is leader, it stays leader until it leaves/crashes. Or in > other words: the first instance started on a fresh setup becomes leader. > IIUC, currently we can have either I. strong guarantees that 'leader != master' or II. best effort to enforce 'leader == master'. Assuming avoiding quirks in jobs processing requires a broader solution than what was introduced in SLING-3253, wouldn't it make sense to allow guaranteeing II. ? IMO the leader would still be relatively stable (not impacted by addition of new instances in the topology) and would allow to guarantee an active/passive cluster setup. wdyt? Regards, Timothee > But that might not suffice in your case I assume.. > > Cheers, > Stefan > > On 3/16/15 12:15 PM, "Timothée Maret" <[email protected]> wrote: > > >Hi, > > > >In a deployment, we use a CRX (TarPM) active/passive cluster composed of > >one master instance and one slave instance. > >We run background jobs on this deployment and we want to have them run on > >the CRX master only in order to guarantee no writes on the slave thus > >keeping the activate/passive scheme. > > > >The way we currently do it is by checking in our background code, that the > >instance is the Sling leader and only run the code if the instance is the > >leader. > > > >This works only if the following holds at any time > > > >Sling leader == CRX master > > > >We experienced some cases where this seemed not to be the case. > >Should we expect the above mentioned mapping to be valid or should be find > >another way to enforce background services to only execute on the master ? > >We could use some CRX specific code for that, but our code would not > >become > >portable to other MK. > > > >Regards, > > > >Timothee > > >
