[ 
https://issues.apache.org/jira/browse/SOLR-17106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837877#comment-17837877
 ] 

David Smiley commented on SOLR-17106:
-------------------------------------

Just want to mention that we should probably have these settings choose default 
values via EnvUtils (a new thing) so we can conveniently make these settings 
adjustments via system properties or env vars, whichever is convenient to the 
user.

> LBSolrClient: Make it configurable to remove zombie ping checks
> ---------------------------------------------------------------
>
>                 Key: SOLR-17106
>                 URL: https://issues.apache.org/jira/browse/SOLR-17106
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Aparna Suresh
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following discussion from a dev list discussion here: 
> [https://lists.apache.org/thread/f0zfmpg0t48xrtppyfsmfc5ltzsq2qqh]
> The issue involves scalability challenges in SolrJ's *LBSolrClient* when a 
> pod with numerous cores experiences connectivity problems. The "zombie" 
> tracking mechanism, operating on a core basis, becomes a bottleneck during 
> distributed search on a massive multi shard collection. Threads attempting to 
> reach unhealthy cores contribute to a high computational load, causing 
> performance issues. 
> As suggested by Chris Hostetter: LBSolrClient could be configured to disable 
> zombie "ping" checks, but retain zombie tracking. Once a replica/endpoint is 
> identified as a zombie, it could be held in zombie jail for X seconds, before 
> being released - hoping that by this timeframe ZK would be updated to mark 
> this endpoint DOWN or the pod is back up and CloudSolrClient would avoid 
> querying it. In any event, only 1 failed query would be needed to send the 
> server back to zombie jail.
>  
> There are benefits in doing this change:
>  * Eliminate the zombie ping requests, which would otherwise overload pod(s) 
> coming up after a restart
>  * Avoid memory leaks, in case a node/replica goes away permanently, but it 
> stays as zombie forever, with a background thread in LBSolrClient constantly 
> pinging it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to