[ 
https://issues.apache.org/jira/browse/SOLR-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18072427#comment-18072427
 ] 

ASF subversion and git services commented on SOLR-18176:
--------------------------------------------------------

Commit bfc04fc589dc5e8b00ed139b4e957aad710e1137 in solr's branch 
refs/heads/main from Matthew Biscocho
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=bfc04fc589d ]

SOLR-18176: HttpShardHandler query throughput bottleneck from ZooKeeper (#4237)

CloudReplicaSource was making a clusterstate call to ZooKeeper for every 
distributed request if you search over multiple collections, and when the 
coordinator has no local replica for some of them. This is because the get call 
was bypassing state cache. This created a severe bottleneck in query throughput 
so small fix made to just enable cached state lookups.

> HttpShardHandler query throughput bottleneck from ZooKeeper
> -----------------------------------------------------------
>
>                 Key: SOLR-18176
>                 URL: https://issues.apache.org/jira/browse/SOLR-18176
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 10.0, 9.10.1
>            Reporter: Matthew Biscocho
>            Assignee: Matthew Biscocho
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2026-03-24-13-14-15-761.png
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I found significant throughput performance bottlenecking from queries with a 
> Solr cloud containing nodes sharing collections and heavily sharded. What I 
> noticed was as Solr query load increased, ZooKeeper CPU utilization followed 
> linearly. Taking a JFR dump, it showed that every distrib query in 
> HttpShardHandler was doing a synchronized get [without allowCache=true hereĀ 
> |https://github.com/apache/solr/blob/2ea21db9af976eee8ed10c08fb95e071889387be/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L192]for
>  collection state from ZooKeeper which eventually started bottlenecking 
> zookeeper reads and holding QTP threads drastically making query latency 
> worse.
> Changing to use cache resulting in a huge boost in query throughput and 
> reduction in ZooKeeper CPU utilization.
> !image-2026-03-24-13-14-15-761.png!
> PR to follow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to