[ 
https://issues.apache.org/jira/browse/SOLR-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18072437#comment-18072437
 ] 

ASF subversion and git services commented on SOLR-18176:
--------------------------------------------------------

Commit b8a1f8223e13dd26ed3a84943e43380d51e1447e in solr's branch 
refs/heads/branch_10x from Matthew Biscocho
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=b8a1f8223e1 ]

SOLR-18176: HttpShardHandler query throughput bottleneck from ZooKeeper (#4237)

CloudReplicaSource was making a clusterstate call to ZooKeeper for every 
distributed request if you search over multiple collections, and when the 
coordinator has no local replica for some of them. This is because the get call 
was bypassing state cache. This created a severe bottleneck in query throughput 
so small fix made to just enable cached state lookups.

> HttpShardHandler query throughput bottleneck from ZooKeeper
> -----------------------------------------------------------
>
>                 Key: SOLR-18176
>                 URL: https://issues.apache.org/jira/browse/SOLR-18176
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 10.0, 9.10.1
>            Reporter: Matthew Biscocho
>            Assignee: Matthew Biscocho
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2026-03-24-13-14-15-761.png
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I found significant throughput performance bottlenecking from queries with a 
> Solr cloud containing nodes sharing collections and heavily sharded. What I 
> noticed was as Solr query load increased, ZooKeeper CPU utilization followed 
> linearly. Taking a JFR dump, it showed that every distrib query in 
> HttpShardHandler was doing a synchronized get [without allowCache=true hereĀ 
> |https://github.com/apache/solr/blob/2ea21db9af976eee8ed10c08fb95e071889387be/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L192]for
>  collection state from ZooKeeper which eventually started bottlenecking 
> zookeeper reads and holding QTP threads drastically making query latency 
> worse.
> Changing to use cache resulting in a huge boost in query throughput and 
> reduction in ZooKeeper CPU utilization.
> !image-2026-03-24-13-14-15-761.png!
> PR to follow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to