[ 
https://issues.apache.org/jira/browse/SOLR-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152227#comment-17152227
 ] 

Michael DeBruyn commented on SOLR-11208:
----------------------------------------

This makes auto scaling policies virtually useless.  I'm currently running 
7.7.3 (testing with 8.5.2) with 3x TLOG and 6x PULL nodes that serve 19 
collections where the pull nodes are somewhat transient in K8S.  When a node is 
replaced the node_lost_trigger and node_added_trigger we have in place fail 
more often than not due to the tiny thread pool and inability to queue the 
requests.
{noformat}
          "response": [
            "Operation deletenode caused exception:",
            
"java.util.concurrent.RejectedExecutionException:java.util.concurrent.RejectedExecutionException:
 Task 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$190/0x00007f1a4c8a3db8@3d9a7483
 rejected from 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@40bcc8d4[Running,
 pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 259]",
            "exception",
            {
              "msg": "Task 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$190/0x00007f1a4c8a3db8@3d9a7483
 rejected from 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@40bcc8d4[Running,
 pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 259]",
              "rspCode": -1
            }
          ]
{noformat}
 

> Usage SynchronousQueue in Executors prevent large scale operations
> ------------------------------------------------------------------
>
>                 Key: SOLR-11208
>                 URL: https://issues.apache.org/jira/browse/SOLR-11208
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 6.6
>            Reporter: Björn Häuser
>            Priority: Major
>         Attachments: response.json
>
>
> I am not sure where to start with this one.
> I tried to post this already on the mailing list: 
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201708.mbox/%3c48c49426-33a2-4d79-ae26-a4515b8f8...@gmail.com%3e
> In short: the usage of a SynchronousQueue as the workQeue prevents more tasks 
> than max threads.
> For example, taken from OverseerCollectionMessageHandler:
> {code:java}
>   ExecutorService tpe = new ExecutorUtil.MDCAwareThreadPoolExecutor(5, 10, 
> 0L, TimeUnit.MILLISECONDS,
>       new SynchronousQueue<>(),
>       new 
> DefaultSolrThreadFactory("OverseerCollectionMessageHandlerThreadFactory"));
> {code}
> This Executor is used when doing a REPLACENODE (= ADDREPLICA) command. When 
> the node has more than 10 collections this will fail with the mentioned 
> java.util.concurrent.RejectedExecutionException.
> I am also not sure how to fix this. Just replacing the queue with a different 
> implementation feels wrong to me or could cause unwanted side behaviour.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to