[
https://issues.apache.org/jira/browse/SOLR-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810887#comment-16810887
]
Andrzej Bialecki commented on SOLR-13376:
------------------------------------------
Hmm, indeed there's a race condition here.
The reason for having more than 1 node attempt creating a nodeLost marker is
that more than 1 node may go away (3 was a magic number ;) that we felt wasn't
excessive and still reduced the chance of losing the event due to multiple node
failures).
This cleaning of leftover markers in {{OverseerTriggerThread}} was added early
on when we added this functionality, and it may not be necessary anymore -
there's {{InactiveMarkersPlanAction}} that runs periodically to remove stale
markers.
> Multi-node race condition to create/remove nodeLost markers
> -----------------------------------------------------------
>
> Key: SOLR-13376
> URL: https://issues.apache.org/jira/browse/SOLR-13376
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Assignee: Andrzej Bialecki
> Priority: Major
>
> NodeMarkersRegistrationTest.testNodeMarkersRegistration is frequently failing
> on jenkins builds in the same spot, with a similar looking logs.
> Although i haven't been able to reproduce these failures locally, I am fairly
> confident that the problem is a race condition bug that exists between
> when/how a new Overseer will process & clean up "nodeLost" marker's in ZK,
> with how other nodes may (mistakenly) re-create those markers in their
> liveNodes listener.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]