[
https://issues.apache.org/jira/browse/SLING-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498040#comment-14498040
]
Stefan Egli commented on SLING-3432:
------------------------------------
The problem is that there is a conflict with the API... the API says:
* TOPOLOGY_CHANGED always carries a new TopologyView
([TopologyEvent.getNewView() does not return null for
TOPOLOGY_CHANGED|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/TopologyEvent.java#L144])
* (not explicitly unfortunately :( but the intention was definitely that..)
every TopologyView contains the local instance
([TopologyView.getLocalInstance() should never return
null|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/TopologyView.java#L47])
* every Instance(-Description) is always guaranteed to be part of a
Cluster(View) ([InstanceDescription.getClusterView() is never
null|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/InstanceDescription.java#L60])
* and every ClusterView must always have a leader ([ClusterView.getLeader() is
never
null|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/ClusterView.java#L70])
In other words: the API says that a TOPOLOGY_CHANGED always defines the leader
of the local cluster. If you now map the isolated mode to a TOPOLOGY_CHANGED
event, you have to declare one of the local cluster a leader (perhaps the
isolated local instance). And that results in duplicate leaders (should the
isolation be due to a pseudo-network-partitioning as described in this ticket -
which is likely in an eventually consistent repository).
Which brings me back to the original intend of removing the isolation mode: the
intend was to assure the +isolated mode does not have a leader+! Is there an
agreement of this goal?
> pseudo network partition causes job deserialization issue in a cluster (when
> reading while job is being reassigned)
> -------------------------------------------------------------------------------------------------------------------
>
> Key: SLING-3432
> URL: https://issues.apache.org/jira/browse/SLING-3432
> Project: Sling
> Issue Type: Bug
> Components: Extensions
> Affects Versions: Discovery Impl 1.0.2
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Fix For: Discovery Impl 1.1.2
>
>
> There is a race condition between two instances in a cluster (eg oak or crx):
> Instance 1 is writing a job with a binary property, instance 2 is reading the
> job (likely triggered by discovery sending it a topologychangedevent). It
> looks like instance 2 is reading the job just about while instance 1 is still
> in the process or completely writing the job, or at least the binary.
> Resulting in the following exception:
> 04.03.2014 06:55:39.667 *WARN* [Apache Sling Job Background Loader]
> org.apache.sling.event.impl.jobs.JobManagerImpl Unable to read job from
> /var/eventing/jobs/assigned/e4337f8f-47d2-41df-b3ab-0d40b1b2acd4/slingevent:eventadmin/2014/3/3/8/45/cq.wcm.msm.job.pageEvent_9718d7db-85b4-4930-a2ba-11a80d772970_172
> java.lang.Exception: Unable to deserialize property 'pageEvent'
> at
> org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:213)
> at
> org.apache.sling.event.impl.jobs.JobManagerImpl.readJob(JobManagerImpl.java:538)
> at
> org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobInTheBackground(BackgroundLoader.java:318)
> at
> org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobsInTheBackground(BackgroundLoader.java:294)
> at
> org.apache.sling.event.impl.jobs.BackgroundLoader.run(BackgroundLoader.java:203)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException: null
> at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2280)
> at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2749)
> at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:779)
> at java.io.ObjectInputStream.<init>(ObjectInputStream.java:279)
> at
> org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:208)
> ... 5 common frames omitted
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)