[jira] [Commented] (SLING-3432) pseudo network partition causes job deserialization issue in a cluster (when reading while job is being reassigned)

Timothee Maret (JIRA) Wed, 15 Apr 2015 09:19:14 -0700

    [ 
https://issues.apache.org/jira/browse/SLING-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496460#comment-14496460
 ]


Timothee Maret commented on SLING-3432:
---------------------------------------

bq. The listener is informed that 'something is wrong in the topology' via the 
TOPOLOGY_CHANGING event as soon as the discovery notices this.
I agree, though the information carried by the announce is different. 
Without isolated mode, the instance can't differentiate between I. a topology 
vote that takes a while and II. being disconnected from the topology.
Applications may benefit from this distinction, for instance to do operations 
(registering to services, allocating resources, etc.) only when joining the 
topology or leaving it.

Also, although I guess it was not meant to be implemented based on actual 
waiting, the topology listeners would not have to wait on any particular event 
at any time.

The topology listeners would do

{code}
handleTopologyEvent(TopologyEvent event) {
    View view event.getNewView();
    if ("isolated".equals(view.id())) {
        // do things in isolated mode
    } else {
        // do things in connected mode
    }
}
{code}



> pseudo network partition causes job deserialization issue in a cluster (when 
> reading while job is being reassigned)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-3432
>                 URL: https://issues.apache.org/jira/browse/SLING-3432
>             Project: Sling
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: Discovery Impl 1.0.2
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>             Fix For: Discovery Impl 1.1.2
>
>
> There is a race condition between two instances in a cluster (eg oak or crx): 
> Instance 1 is writing a job with a binary property, instance 2 is reading the 
> job (likely triggered by discovery sending it a topologychangedevent). It 
> looks like instance 2 is reading the job just about while instance 1 is still 
> in the process or completely writing the job, or at least the binary. 
> Resulting in the following exception:
> 04.03.2014 06:55:39.667 *WARN* [Apache Sling Job Background Loader] 
> org.apache.sling.event.impl.jobs.JobManagerImpl Unable to read job from 
> /var/eventing/jobs/assigned/e4337f8f-47d2-41df-b3ab-0d40b1b2acd4/slingevent:eventadmin/2014/3/3/8/45/cq.wcm.msm.job.pageEvent_9718d7db-85b4-4930-a2ba-11a80d772970_172
> java.lang.Exception: Unable to deserialize property 'pageEvent'
>         at 
> org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:213)
>         at 
> org.apache.sling.event.impl.jobs.JobManagerImpl.readJob(JobManagerImpl.java:538)
>         at 
> org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobInTheBackground(BackgroundLoader.java:318)
>         at 
> org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobsInTheBackground(BackgroundLoader.java:294)
>         at 
> org.apache.sling.event.impl.jobs.BackgroundLoader.run(BackgroundLoader.java:203)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException: null
>         at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2280)
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2749)
>         at 
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:779)
>         at java.io.ObjectInputStream.<init>(ObjectInputStream.java:279)
>         at 
> org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:208)
>         ... 5 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SLING-3432) pseudo network partition causes job deserialization issue in a cluster (when reading while job is being reassigned)

Reply via email to