[
https://issues.apache.org/jira/browse/SOLR-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746392#comment-17746392
]
ASF subversion and git services commented on SOLR-16871:
--------------------------------------------------------
Commit ccc7ca65f12ee604c2194105b1b7c44822ad15ae in solr's branch
refs/heads/main from patsonluk
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=ccc7ca65f12 ]
SOLR-16871: Synchronize on a larger block to avoid race condition in
CoordinatorHttpSolrCall init (#1800)
* Synchronize to avoid race condition in CoordinatorHttpSolrCall
* ./gradlew tidy
> Race condition for coordinator node init
> ----------------------------------------
>
> Key: SOLR-16871
> URL: https://issues.apache.org/jira/browse/SOLR-16871
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Patson Luk
> Priority: Major
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> From a unit test case [that issue concurrent select queries to coordinator
> nodes|https://github.com/cowpaths/fullstory-solr/blob/e4226eb8fa2afb01d7615f7faea01f71b144cd58/solr/core/src/test/org/apache/solr/search/TestCoordinatorRole.java#L486],
> it’s found that there could be 3 race condition issues:
> 1. If multiple concurrent requests find the synthetic collection is not yet
> created, they might all attempt to create the synthetic collection. This
> could trigger SolrException on `collection already exists`
> 2. Similarly, if multiple concurrent requests find there’s no replica of the
> synthetic collection for current node (multiple coordinator node scenario),
> then CoordinatorHttpSolrCall#addReplica could be invoked multiple times. This
> should not trigger any exception, but would create multiple replicas for the
> same node in the synthetic collection
> 3. The existing logic
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L102]
> assumes if
> syntheticColl.getReplicas(solrCall.cores.getZkController().getNodeName())
> returns non empty result, then the following call in
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L112]
> should return a core. Unfortunately, the first call can return a non empty
> list but with a DOWN replica if another request is in the progress of
> creating such replica. In this case, the
> solrCall.getCoreByCollection(syntheticCollectionName, isPreferLeader) would
> call super.getCoreByCollection at
> [here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L69]
> which would return a null (since super impl only returns active replica). So
> CoordinatorHttpSolrCall#getCoreByCollection would end up calling
> CoordinatorHttpSolrCall#getCore , introducing an infinite loop and cause
> stack overflow
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]