Hi Timothee,

On 3/17/15 5:37 PM, "Timothée Maret" <timothee.ma...@gmail.com> wrote:

>>Due to an edge-case in job distribution (job is started and executed on
>> CRX master, master crashes before slave is updated, slave becomes
>>master,
>> slave executes job a second time) the suggestion is to make anything
>>*but*
>> the CRX master become the discovery leader.
>
>
>Ok, but isn't it still prone to double execution even when leader !=
>master
>?
>Assuming one master, two slaves and the following scenario: master
>receives
>a job, the job replicated to slaves, one slave executes the job and
>commits
>its changes, slave crashes before the changes are replicated, the other
>slave picks the job and execute it again.

The time window between committing the changes and finishing the job is
much smaller though. There is no absolute guarantee, right, but it is less
likely.

>Can we offer guarantees against double execution, unordered or missing
>execution without some sort of distributed locking, a way to make sure the
>content is replicated or some sort of centralised job dispatcher ?

An absolute guarantee not. And I don't think we aim to magically make this
work with this 'slave be the leader' default. But it reduces the
likelihood a lot.

Re unordered/missing execution: if there is network partitioning (real of
pseudo) then you the ordering would no longer be guaranteed, agreed. Not
sure if you could really miss a job execution though! Network partitioning
is not currently supported though.

>Anyway, AFAIU enforcing 'leader != master' would be against an
>active/passive setup.
>Indeed, if enabled, an application could either process on exactly one crx
>slave 

Right. Why would it be 'against' such a setup though? The application
should not depend on the underlying cluster technology nor deployment.
Ideally it would just make use of the fact that one instance in the
cluster is nominated 'leader' and if it has something to execute only
once, then it should choose that leader to do it.

>>
>> I fear there is no explicit way atm to force the behavior you want.
>>About
>> the closest one I can think of is: the leader is defined to be stable,
>>ie
>> once an instance is leader, it stays leader until it leaves/crashes. Or
>>in
>> other words: the first instance started on a fresh setup becomes leader.
>>
>
>IIUC, currently we can have either I. strong guarantees that 'leader !=
>master' or II. best effort to enforce 'leader == master'.
>Assuming avoiding quirks in jobs processing requires a broader solution
>than what was introduced in SLING-3253, wouldn't it make sense to allow
>guaranteeing II. ?

What you can always do is make your implementation also check on the
underlying repository descriptor yourself - and take that one if it is
set, otherwise use the sling discovery..

>IMO the leader would still be relatively stable (not impacted by addition
>of new instances in the topology) and would allow to guarantee an
>active/passive cluster setup.

Both I and II have the negative side-effect that in case the master
crashes, the leader might change. So in that sense, they both break the
'strong leader' argument - so it would not introduce anything more
negative there.

So yes, discovery could support II - but you could also read the
descriptor explicitly as an alternative.

Depends on which way you'd like to go - if you'd like to have this though,
could you pls create a ticket?

Cheers,
Stefan


Reply via email to