[
https://issues.apache.org/jira/browse/SOLR-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742707#comment-17742707
]
ASF subversion and git services commented on SOLR-16871:
--------------------------------------------------------
Commit fa024e8cbcde73aba91c34b7aa47bbed795d8b79 in solr's branch
refs/heads/main from patsonluk
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=fa024e8cbcd ]
SOLR-16871: Race condition in `CoordinatorHttpSolrCall` synthetic
collection/replica init (#1762)
> Race condition for coordinator node init
> ----------------------------------------
>
> Key: SOLR-16871
> URL: https://issues.apache.org/jira/browse/SOLR-16871
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Patson Luk
> Priority: Major
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> From a unit test case [that issue concurrent select queries to coordinator
> nodes|https://github.com/cowpaths/fullstory-solr/blob/e4226eb8fa2afb01d7615f7faea01f71b144cd58/solr/core/src/test/org/apache/solr/search/TestCoordinatorRole.java#L486],
> it’s found that there could be 3 race condition issues:
> 1. If multiple concurrent requests find the synthetic collection is not yet
> created, they might all attempt to create the synthetic collection. This
> could trigger SolrException on `collection already exists`
> 2. Similarly, if multiple concurrent requests find there’s no replica of the
> synthetic collection for current node (multiple coordinator node scenario),
> then CoordinatorHttpSolrCall#addReplica could be invoked multiple times. This
> should not trigger any exception, but would create multiple replicas for the
> same node in the synthetic collection
> 3. The existing logic
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L102]
> assumes if
> syntheticColl.getReplicas(solrCall.cores.getZkController().getNodeName())
> returns non empty result, then the following call in
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L112]
> should return a core. Unfortunately, the first call can return a non empty
> list but with a DOWN replica if another request is in the progress of
> creating such replica. In this case, the
> solrCall.getCoreByCollection(syntheticCollectionName, isPreferLeader) would
> call super.getCoreByCollection at
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L69]
> which would return a null (since super impl only returns active replica). So
> CoordinatorHttpSolrCall#getCoreByCollection would end up calling
> CoordinatorHttpSolrCall#getCore , introducing an infinite loop and cause
> stack overflow
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]