[
https://issues.apache.org/jira/browse/SLING-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745564#comment-14745564
]
Stefan Egli commented on SLING-5027:
------------------------------------
Note that this loop - besides being an additional resource-load while it lasts
(typically only a few seconds though, but as mentioned this can be longer
should the cluster experience delays and cause the vote in the first place, eg
due to a heartbeat timeout..) - also has a higher risk of running into a
conflict that would otherwise (without the votedAt guy) not occur. Here's an
example of the conflict when running on oak:
{code}
Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: OakState0001:
Unresolved conflicts in /var/discovery/impl/ongoingVotings
at
org.apache.jackrabbit.oak.plugins.commit.ConflictValidator.failOnMergeConflict(ConflictValidator.java:115)
at
org.apache.jackrabbit.oak.plugins.commit.ConflictValidator.propertyAdded(ConflictValidator.java:84)
at
org.apache.jackrabbit.oak.spi.commit.CompositeEditor.propertyAdded(CompositeEditor.java:83)
at
org.apache.jackrabbit.oak.spi.commit.EditorDiff.propertyAdded(EditorDiff.java:82)
at
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:375)
at
org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148)
at
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:396)
at
org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148)
at
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:396)
at
org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148)
at
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:396)
at
org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148)
at
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:396)
at
org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:52)
at
org.apache.jackrabbit.oak.spi.commit.EditorHook.processCommit(EditorHook.java:54)
at
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
at
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
at
org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch$InMemory.merge(AbstractNodeStoreBranch.java:557)
at
org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch.merge0(AbstractNodeStoreBranch.java:329)
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge(DocumentNodeStoreBranch.java:148)
at
org.apache.jackrabbit.oak.plugins.document.DocumentRootBuilder.merge(DocumentRootBuilder.java:159)
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.merge(DocumentNodeStore.java:1473)
at
org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:247)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:391)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:537)
{code}
> vote loop until vote is promoted
> --------------------------------
>
> Key: SLING-5027
> URL: https://issues.apache.org/jira/browse/SLING-5027
> Project: Sling
> Issue Type: Bug
> Components: Extensions
> Affects Versions: Discovery Impl 1.1.0
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Fix For: Discovery Impl 1.1.8
>
>
> {{VotingHandler.analyzeVotings}} has the risk of running into a busy vote
> loop during the duration of an {{/ongoingVotings}}:
> * {{analyzeVotings}} is invoked either when something changes in
> {{/var/discovery/impl/ongoingVotings}} or as part of a heartbeat
> * as part of this, it figures out the ongoingVotings, decides on which it has
> to vote (should there be more than 1, and it doesn't vote if it was the
> initiator) then potentially does a {{vote(,true)}} on it
> * in {{vote()}} it not only sets the {{vote}} property accordingly (true or
> false) - it additionally, since
> [SLING-3434|https://issues.apache.org/jira/browse/SLING-3434?focusedCommentId=14297078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14297078],
> also sets the {{votedAt}} property (timestamp)
> * since the above is a change in {{/ongoingVotings}}, this will trigger an
> observation event - which again triggers the {{analyzeVotings}} to be called,
> which will again find an ongoing vote, vote upon, trigger an observation
> event etc. Resulting in an endless loop that involves repository and an
> observation handler
> * now the above loop only occurs since the introduction of the {{votedAt}}
> property - as that is changing on each iteration. Without that, voting again
> with the same boolean would not result in an observation event and the loop
> would not happen at all.
> * in any case, the loop lasts at maximum until the initiator finally got all
> the votes and can promote the vote to an {{/establishedView}}. This should
> typically be a fast operation - but if the cluster is under heavy load and
> experiences delays for some reason, this busy loop can last a little while.
> So this loop is a regression introduced with SLING-3434.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)