[
https://issues.apache.org/jira/browse/SOLR-16454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619723#comment-17619723
]
Patson Luk commented on SOLR-16454:
-----------------------------------
The details can be found in
[https://github.com/fullstorydev/lucene-solr/pull/142] :)
Quick summary:
The {{org.apache.zookeeper.KeeperException$NoNodeException}} is triggered
sometimes from {{completedMap}} field of type {{SizeLimitedDistributedMap}} in
{{{}OverseerTaskProcessor{}}}, while performing clean up in
[here|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/cloud/SizeLimitedDistributedMap.java#L91-L98]
The reason - multiple threads can enter the same code block and try to delete
the same list of children which the slower threads can delete on child node
that no longer exists.
The proposed solution is to be a bit more forgiving with such exception with a
catch block such as
{{} catch (KeeperException.NoNodeException e) {}}
{{//this could happen if multiple threads try to clean the same map}}
{{}}}
> Fixed race condition that trigger error on SizeLimitedDistributedMap …
> ----------------------------------------------------------------------
>
> Key: SOLR-16454
> URL: https://issues.apache.org/jira/browse/SOLR-16454
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 9.1
> Reporter: Hitesh Khamesra
> Priority: Major
>
> here
> [https://github.com/apache/solr/blob/19f109842fb34069346a9efb21cf01b6706830a8/solr/core/src/java/org/apache/solr/cloud/SizeLimitedDistributedMap.java#L94]
>
> We should catch zk exception, as it can lead wired race condirions.
>
> [~patson] Can you please add the details
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]