Patson Luk created SOLR-16871:
---------------------------------
Summary: Race condition for coordinator node init
Key: SOLR-16871
URL: https://issues.apache.org/jira/browse/SOLR-16871
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCloud
Reporter: Patson Luk
>From a unit test case [that issue concurrent select queries to coordinator
>nodes|https://github.com/cowpaths/fullstory-solr/blob/e4226eb8fa2afb01d7615f7faea01f71b144cd58/solr/core/src/test/org/apache/solr/search/TestCoordinatorRole.java#L486],
> it’s found that there could be 3 race condition issues:
1. If multiple concurrent requests find the synthetic collection is not yet
created, they might all attempt to create the synthetic collection. This could
trigger SolrException on `collection already exists`
2. Similarly, if multiple concurrent requests find there’s no replica of the
synthetic collection for current node (multiple coordinator node scenario),
then CoordinatorHttpSolrCall#addReplica could be invoked multiple times. This
should not trigger any exception, but would create multiple replicas for the
same node in the synthetic collection
3. The existing logic
[here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L102]
assumes if
syntheticColl.getReplicas(solrCall.cores.getZkController().getNodeName())
returns non empty result, then the following call in
[here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L112]
should return a core. Unfortunately, the first call can return a non empty
list but with a DOWN replica if another request is in the progress of creating
such replica. In this case, the
solrCall.getCoreByCollection(syntheticCollectionName, isPreferLeader) would
call super.getCoreByCollection at
[here|https://github.com/cowpaths/fullstory-solr/blob/6c8531f08301a291478502c262499abed0d5075c/solr/core/src/java/org/apache/solr/servlet/CoordinatorHttpSolrCall.java#L69]
which would return a null (since super impl only returns active replica). So
CoordinatorHttpSolrCall#getCoreByCollection would end up calling
CoordinatorHttpSolrCall#getCore , introducing an infinite loop and cause stack
overflow
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]