[
https://issues.apache.org/jira/browse/SOLR-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492931#comment-17492931
]
Chris M. Hostetter commented on SOLR-16013:
-------------------------------------------
I started digging into this because Joel and i noticed a weird situation that
would pop up occasionally when {{ADDREPLICA}} commands would be sent to a
cluster while nodes were shutting down (or restarting .. this is in
kubernetes). Sometimes N {{ADDREPLICA}} commands would become N+1 {{CREATE}}
core commands, and we traced the logs down to the overseer logging that it's
adding a replica around the same time that it starts shutting down, then a new
node becomes the overseer and also says it's adding a replica before the
original overseer has logged that it's finished.
----
It seems pretty straight forward to me that {{ZkController}} should wait for
{{IOUtils.closeQuietly(overseer)}} to complete, before calling
{{IOUtils.closeQuietly(overseerElector.getContext())}} ... does anyone have any
idea why this _isn't_ the case?
> Overseer gives up election node before closing - inflight commands can be
> processed twice
> -----------------------------------------------------------------------------------------
>
> Key: SOLR-16013
> URL: https://issues.apache.org/jira/browse/SOLR-16013
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Priority: Major
>
> {{ZkController}} shutdown currently has these two lines (in this order)...
> {code:java}
> customThreadPool.submit(() ->
> IOUtils.closeQuietly(overseerElector.getContext()));
> customThreadPool.submit(() -> IOUtils.closeQuietly(overseer));
> {code}
> AFAICT this means that means that the overseer nodeX will give up it's
> election node (via overseerElector) allowing some other nodeY to be elected a
> new overseer, **BEFORE** Overseer nodeX shuts down it's {{Overseer}} object,
> which waits for the {{OverseerThread}} to finish processing any tasks in
> process.
> In practice, this seems to make it possible for a single command in the
> overseer queue to get processed twice.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]