[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-17275:
--------------------------------
    Priority: Blocker  (was: Major)

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-17275
>                 URL: https://issues.apache.org/jira/browse/SOLR-17275
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 9.6.0
>         Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>            Reporter: Rafał Harabień
>            Priority: Blocker
>             Fix For: 9.6.1
>
>         Attachments: image-2024-05-06-17-23-42-236.png
>
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>       at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>       -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>       at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>       at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>       at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>       ...
>       Number of locked synchronizers = 1
>       - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to