[jira] [Commented] (SOLR-17348) Mitigate extreme parallelism of zkCallback executor

David Smiley (Jira) Wed, 26 Jun 2024 03:53:04 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860121#comment-17860121
 ]


David Smiley commented on SOLR-17348:
-------------------------------------

bq. The new config might look something like: corePoolSize=1024

I don't think we should have corePoolSize so high on any of our executors.  
It's pretty wasteful if only rarely there's a burst.  Here it's just for 
callbacks; I think 1 is fine.

Interesting strategy of using single-thread to root out limitations / 
expectations of parallelism.  Even if there are some dependencies, no system 
can support an unlimited number.  It'd be nice to switch to Java [virtual 
threads|https://blogs.oracle.com/javamagazine/post/java-virtual-threads] for 
this specific executor but we haven't explored them in Solr yet; there are some 
constraints.

I did some digging on limiting capacity of Executors without rejection; 
Stackoverflow has a number of solutions and subtle pros/cons.  I like [this 
answer|https://stackoverflow.com/a/24420823/92186] (which I added a comment on).

> Mitigate extreme parallelism of zkCallback executor
> ---------------------------------------------------
>
>                 Key: SOLR-17348
>                 URL: https://issues.apache.org/jira/browse/SOLR-17348
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Michael Gibney
>            Priority: Minor
>
> zkCallback executor is [currently an unbounded thread pool of core size 
> 0|https://github.com/apache/solr/blob/709a1ee27df23b419d09fe8f67c3276409131a4a/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L91-L92],
>  using a SynchronousQueue. Thus, a flood of zkCallback events (as might be 
> triggered by a cluster restart, e.g.) can result in spinning up a very large 
> number of threads. In practice we have encountered as many as 35k threads 
> created in some such cases, even after the impact of this situation was 
> reduced by the fix for SOLR-11535.
> Inspired by [~cpoerschke]'s recent [closer look at thread pool 
> behavior|https://issues.apache.org/jira/browse/SOLR-13350?focusedCommentId=17853178#comment-17853178],
>  I wondered if we might be able to employ a bounded queue to alleviate some 
> of the pressure from bursty zk callbacks.
> The new config might look something like: {{corePoolSize=1024, 
> maximumPoolSize=Integer.MAX_VALUE, allowCoreThreadTimeout=true, workQueue=new 
> LinkedBlockingQueue<>(1024)}}. This would allow the pool to grow up to (and 
> shrink from) corePoolSize in the same manner it currently does, but once 
> exceeding corePoolSize (e.g. during a cluster restart or other callback flood 
> event), tasks would be queued (up to some fixed limit). If the queue limit is 
> exceeded, new threads would still be created, but we would have avoided the 
> current “always create a thread” behavior, and by so doing hopefully reduce 
> task execution time and improve overall throughput.
> From the ThreadPoolExecutor javadocs:
> {quote}Direct handoffs. A good default choice for a work queue is a 
> SynchronousQueue that hands off tasks to threads without otherwise holding 
> them. Here, an attempt to queue a task will fail if no threads are 
> immediately available to run it, so a new thread will be constructed. This 
> policy avoids lockups when handling sets of requests that might have internal 
> dependencies. Direct handoffs generally require unbounded maximumPoolSizes to 
> avoid rejection of new submitted tasks. This in turn admits the possibility 
> of unbounded thread growth when commands continue to arrive on average faster 
> than they can be processed.{quote}
> So afaict SynchronousQueue mainly makes sense if there exists the possibility 
> of deadlock due to dependencies among tasks, and I think this should ideally 
> _not_ be the case with zk callbacks (though in practice I'm not sure this is 
> the case).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-17348) Mitigate extreme parallelism of zkCallback executor

Reply via email to