[jira] [Comment Edited] (SLING-3432) pseudo network partition causes job deserialization issue in a cluster (when reading while job is being reassigned)

Stefan Egli (JIRA) Wed, 15 Apr 2015 07:49:17 -0700

    [ 
https://issues.apache.org/jira/browse/SLING-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495936#comment-14495936
 ]


Stefan Egli edited comment on SLING-3432 at 4/15/15 2:47 PM:
-------------------------------------------------------------

Note that [the 
above|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]
 does not solve the problem where the underlying repository is eventually 
consistent and the heartbeat configured is too low to catch all possible delays 
(that such an eventually consistent repository might produce under load). 
Consider the following:
# a cluster consisting of 3 nodes: A, B and C, A is the leader
# writes from B and C are fast - and can be read by all 3 nodes fast
# writes from A though are slow (ie A behaves asymmetric: slow writes but fast 
reads)
# at some point writes from A are slower than the configured heartbeat timeout: 
at this point B and C find out about this and vote on a new clusterView 
consisting only of B and C and (let's say) B becomes leader.
#* meanwhile at A however: A is still happy: it sees the heartbeats of B and C 
in time and would not start a new voting.
# at some later point (with a *certain read delay*) A sees that B and C have 
declared a new {{/establishedViews}} - at this point it would (according to the 
new rule above) immediately send a TOPOLOGY_CHANGING and things would be 'ok' 
again. 
#* *but* until it does send this event - *between 4. and 5. - we have two 
leaders: A and B*! -> thus could see issues reported here in SLING-3432 still 
during that small timeframe (which is basically the amount of time it takes for 
the new established view declared by B and C to be read by A).
#* at a later time, when eg the delays in the repository have come down, A 
would rejoin the cluster - but would have to *not become leader* again, as the 
leader is B and must stay stable.

This IMHO highlights the problem that using an eventually consistent repository 
(that has no max guaranteed delay) is *not* 
pseudo-network-partition/duplicate-leader free under load.

Note that what is described here is not fixed by SLING-4627.


was (Author: egli):
Note that [the 
above|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]
 does not solve the problem where the underlying repository is eventually 
consistent and the heartbeat configured is too low to catch all possible delays 
(that such an eventually consistent repository might produce under load). 
Consider the following:
# a cluster consisting of 3 nodes: A, B and C, A is the leader
# writes from B and C are fast - and can be read by all 3 nodes fast
# writes from A though are slow (ie A behaves asymmetric: slow writes but fast 
reads)
# at some point writes from A are slower than the configured heartbeat timeout: 
at this point B and C find out about this and vote on a new clusterView 
consisting only of B and C and (let's say) B becomes leader.
#* meanwhile at A however: A is still happy: it sees the heartbeats of B and C 
in time and would not start a new voting.
# at some later point (with a *certain read delay*) A sees that B and C have 
declared a new {{/establishedViews}} - at this point it would (according to the 
new rule above) immediately send a TOPOLOGY_CHANGING and things would be 'ok' 
again. 
#* *but* until it does send this event - *between 4. and 5. - we have two 
leaders: A and B*! -> thus could see issues reported here in SLING-3432 still 
during that small timeframe (which is basically the amount of time it takes for 
the new established view declared by B and C to be read by A).
#* at a later time, when eg the delays in the repository have come down, A 
would rejoin the cluster - but would have to *not become leader* again, as the 
leader is B and must stay stable.

This IMHO highlights the problem that using an eventually consistent repository 
(that has no max guaranteed delay) is *not* 
pseudo-network-partition/duplicate-leader free under load.

> pseudo network partition causes job deserialization issue in a cluster (when 
> reading while job is being reassigned)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-3432
>                 URL: https://issues.apache.org/jira/browse/SLING-3432
>             Project: Sling
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: Discovery Impl 1.0.2
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>             Fix For: Discovery Impl 1.1.2
>
>
> There is a race condition between two instances in a cluster (eg oak or crx): 
> Instance 1 is writing a job with a binary property, instance 2 is reading the 
> job (likely triggered by discovery sending it a topologychangedevent). It 
> looks like instance 2 is reading the job just about while instance 1 is still 
> in the process or completely writing the job, or at least the binary. 
> Resulting in the following exception:
> 04.03.2014 06:55:39.667 *WARN* [Apache Sling Job Background Loader] 
> org.apache.sling.event.impl.jobs.JobManagerImpl Unable to read job from 
> /var/eventing/jobs/assigned/e4337f8f-47d2-41df-b3ab-0d40b1b2acd4/slingevent:eventadmin/2014/3/3/8/45/cq.wcm.msm.job.pageEvent_9718d7db-85b4-4930-a2ba-11a80d772970_172
> java.lang.Exception: Unable to deserialize property 'pageEvent'
>         at 
> org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:213)
>         at 
> org.apache.sling.event.impl.jobs.JobManagerImpl.readJob(JobManagerImpl.java:538)
>         at 
> org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobInTheBackground(BackgroundLoader.java:318)
>         at 
> org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobsInTheBackground(BackgroundLoader.java:294)
>         at 
> org.apache.sling.event.impl.jobs.BackgroundLoader.run(BackgroundLoader.java:203)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException: null
>         at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2280)
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2749)
>         at 
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:779)
>         at java.io.ObjectInputStream.<init>(ObjectInputStream.java:279)
>         at 
> org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:208)
>         ... 5 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SLING-3432) pseudo network partition causes job deserialization issue in a cluster (when reading while job is being reassigned)

Reply via email to