[ 
https://issues.apache.org/jira/browse/SOLR-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813718#comment-16813718
 ] 

Andrzej Bialecki  commented on SOLR-13376:
------------------------------------------

[~hossman] - this patch changes the {{OverseerTriggerThread}} so that it does 
not remove markers once it's done init-ing all triggers, only marks them 
"inactive". This kills two birds with one stone - it prevents straggler nodes 
from re-creating these markers, and it allows triggers to avoid processing them 
multiple times (on multiple Overseer leader changes). It also speeds up removal 
of markers in {{InactiveMarkersPlanAction}}.

I also added some Ref Guide documentation about the maintenance trigger. I'd 
appreciate a review.

> Multi-node race condition to create/remove nodeLost markers
> -----------------------------------------------------------
>
>                 Key: SOLR-13376
>                 URL: https://issues.apache.org/jira/browse/SOLR-13376
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>         Attachments: SOLR-13376.patch
>
>
> NodeMarkersRegistrationTest.testNodeMarkersRegistration is frequently failing 
> on jenkins builds in the same spot, with a similar looking logs.
> Although i haven't been able to reproduce these failures locally, I am fairly 
> confident that the problem is a race condition bug that exists between 
> when/how a new Overseer will process & clean up "nodeLost" marker's in ZK, 
> with how other nodes may (mistakenly) re-create those markers in their 
> liveNodes listener.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to