[ 
https://issues.apache.org/jira/browse/SOLR-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned SOLR-15106:
-----------------------------------

    Assignee: David Smiley

> Thread in OverseerTaskProcessor should not "return"
> ---------------------------------------------------
>
>                 Key: SOLR-15106
>                 URL: https://issues.apache.org/jira/browse/SOLR-15106
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 8.6, 9.0
>            Reporter: Mathieu Marie
>            Assignee: David Smiley
>            Priority: Major
>
> I have encountered a scenario were ZK was not accessible for a long time (due 
> to _jute.maxbuffer_ issue, but not related to the rest of this issue).
> During that time, the ClusterStateUpdater and OC queues from the Overseer got 
> filled with 1200+ messages.
> Once we restored ZK availability, the ClusterStateUpdater queue got emptied, 
> but not the OC one.
> The Overseer stopped to dequeue from the OC queue.
> After some digging in the code it seems that a *return* from the overseer 
> thread starting the runners could be the issue.
> Code in OverseerTaskProcessor.java 
> (https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L357)
> The lines of codes that immediately follow should also be reviewed carefully 
> as they also return or interrupt the thread that is responsible to execute 
> the runners.
> Anyhow, if anybody hit that same issue, the quick workaround is to bump the 
> overseer instance to elect a new overseer on another node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to