[ https://issues.apache.org/jira/browse/SOLR-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhaohui Ma updated SOLR-13061: ------------------------------ Summary: Solr replica remaining down status when hitting the maxQueueSize as 20000 after Solr servers restarted (was: Solr replica remaining down status when hitting the maxQueueSize as 20000 after restart Solr servers) > Solr replica remaining down status when hitting the maxQueueSize as 20000 > after Solr servers restarted > ------------------------------------------------------------------------------------------------------ > > Key: SOLR-13061 > URL: https://issues.apache.org/jira/browse/SOLR-13061 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 7.2.1 > Environment: Cluster info: 6 nodes, 30 Solr servers > 1000 collections, 10 shards per collection, 3 replica per shard > Exception happened when restarting Solr cluster. > Reporter: Zhaohui Ma > Priority: Blocker > Labels: performance > > 1. Cluster info: 6 nodes, 30 Solr servers > 1000 collections, 10 shards per collection, 3 replica per shard > Exception happened when restarting Solr cluster. > > 2. Exception happened when restarting Solr cluster. The question is NO > exception hander is defined when this exception > "java.lang.IllegalStateException: queue is full" is thrown when arriving at > the threshold > STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to > preRegister and never come up again. > > 3. Suggestions: > a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable? Any plan or risk > to enlarge this queue size as 20000 is too much small. > b. IllegalStateException should be handled and retry logic should be added. > > 4. Detailed error is given as below. > 2018-12-12 11:20:24,737 | ERROR | > coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | > Error waiting for SolrCore to be created | > org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578) > java.util.concurrent.ExecutionException: > org.apache.solr.common.SolrException: Unable to create core > [collection9_shard1_replica3] > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.solr.common.SolrException: Unable to create core > [collection9_shard1_replica3] > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087) > at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546) > ... 5 more > Caused by: java.lang.IllegalStateException: queue is full > at > org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311) > at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346) > at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245) > at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061) > ... 6 more -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org