patsonluk opened a new pull request, #1460:
URL: https://github.com/apache/solr/pull/1460

   https://issues.apache.org/jira/browse/SOLR-16701
   
   # Description
   
   This fixes a race condition on PRS enabled collection deletion, which 
triggers the exception:
   ```
   org.apache.solr.common.SolrException: Error fetching per-replica states
       at 
__randomizedtesting.SeedInfo.seed([C2BFFBF8FE49C1E1:F1C8D9E308D2745]:0)
       at 
app//org.apache.solr.common.cloud.PerReplicaStatesFetcher.fetch(PerReplicaStatesFetcher.java:49)
       at 
app//org.apache.solr.common.cloud.PerReplicaStatesFetcher$LazyPrsSupplier.lambda$new$0(PerReplicaStatesFetcher.java:62)
       at 
app//org.apache.solr.common.cloud.DocCollection$PrsSupplier.get(DocCollection.java:515)
       at app//org.apache.solr.common.cloud.Replica.isLeader(Replica.java:314)
       at app//org.apache.solr.common.cloud.Slice.findLeader(Slice.java:242)
       at app//org.apache.solr.common.cloud.Slice.setPrsSupplier(Slice.java:56)
       at 
app//org.apache.solr.common.cloud.DocCollection.<init>(DocCollection.java:123)
       at 
app//org.apache.solr.common.cloud.ClusterState.collectionFromObjects(ClusterState.java:305)
       at 
app//org.apache.solr.common.cloud.ClusterState.createFromCollectionMap(ClusterState.java:254)
       at 
app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.createFromJsonSupportingLegacyConfigName(ZkClientClusterStateProvider.java:117)
       at 
app//org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1695)
   ```
   
   This could be triggered by:
   1. `fetchCollectionState` is called, and the state.json is fetched
   2. But before the `fetchCollectionState` fetches the PRS entries, the 
collection state.json/PRS are deleted by someone else
   3. `fetchCollectionState` would throw below exception when it reaches the 
PRS fetching logic as the Zk node state.json is no longer around
   
   
   # Solution
   
   Create a specific exception `PrsZkNodeNotFoundException` (that extends 
`SolrException`) when the PRS entries cannot be fetched. Then in 
`ZkStateReader#fetchCollectionState`, catch this exception as well (along with 
the existing `KeeperException.NoNodeException`), and use the same handling to 
fetch the state again.
   
   
   # Tests
   
   Added `ZkStateReaderTest#testDeletePrsCollection` which reproduce such race 
condition, and verify that:
   1. The `ZkStateReader#fetchCollectionState` should not throw exception, 
instead, it should eventually return `null` which indicates the collection is 
deleted
   2. The `PrsZkNodeNotFoundException` was indeed triggered
   
   
   Please take note that the test case was built on the `Breakpoint` introduced 
by another PR https://github.com/apache/solr/pull/1457
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to