[jira] [Commented] (SOLR-13376) Multi-node race condition to create/remove nodeLost markers

Hoss Man (JIRA) Mon, 08 Apr 2019 16:57:12 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812880#comment-16812880
 ]


Hoss Man commented on SOLR-13376:
---------------------------------

{quote}This cleaning of leftover markers in OverseerTriggerThread was added 
early on when we added this functionality, and it may not be necessary anymore 
- there's InactiveMarkersPlanAction that runs periodically to remove stale 
markers.
{quote}
If this test doesn't reflect reality, and it's expected that 
{{InactiveMarkersPlanAction}} is what will clean up the markers, then the test 
needs fixed – because right now it (like many other auto-scalling tests) goes 
out of it's way to disable the {{.scheduled_maintenance}} trigger.

For the record, this is the *exact* question I asked you about when you first 
resolved SOLR-13072 (but initially left this test marked AwaitsFix), but you 
never replied .. you just re-enabled the test (w/o any modifications to it) and 
re-resolved this issue...

https://issues.apache.org/jira/browse/SOLR-13072?focusedCommentId=16732499&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16732499

Also: AFAICT there is *nothing* in the ref-guide that mentions the 
{{.scheduled_maintenance}} trigger, or any of it's (default) actions 
({{inactive_shard_plan}}, {{inactive_markers_plan}}, {{execute_plan}}) or what 
they due, or why they (may) be important for cleaning up things like the 
nodeLost/nodeAdded markers.  that seems like a problematic omission?

> Multi-node race condition to create/remove nodeLost markers
> -----------------------------------------------------------
>
>                 Key: SOLR-13376
>                 URL: https://issues.apache.org/jira/browse/SOLR-13376
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>
> NodeMarkersRegistrationTest.testNodeMarkersRegistration is frequently failing 
> on jenkins builds in the same spot, with a similar looking logs.
> Although i haven't been able to reproduce these failures locally, I am fairly 
> confident that the problem is a race condition bug that exists between 
> when/how a new Overseer will process & clean up "nodeLost" marker's in ZK, 
> with how other nodes may (mistakenly) re-create those markers in their 
> liveNodes listener.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13376) Multi-node race condition to create/remove nodeLost markers

Reply via email to