[ 
https://issues.apache.org/jira/browse/SOLR-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Ma updated SOLR-13061:
------------------------------
    Description: 
1. Cluster info: 6 nodes, 30 Solr servers

1000 collections, 10 shards per collection, 3 replica per shard

Exception happened when restarting Solr cluster.

 

2. Exception happened when restarting Solr cluster. The question is NO 
exception hander is defined when this exception 
"java.lang.IllegalStateException: queue is full" is thrown when arriving at the 
threshold

STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to 
preRegister and never come up again.

 

3. Suggestions:

a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable? Any plan or risk to 
enlarge this queue size as 20,000 is too much small.

b. IllegalStateException should be handled and retry logic should be added.

 

4. Detailed error is given as below.

2018-12-12 11:20:24,737 | ERROR | 
coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | Error 
waiting for SolrCore to be created | 
org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
 java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
Unable to create core [collection9_shard1_replica3]
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: org.apache.solr.common.SolrException: Unable to create core 
[collection9_shard1_replica3]
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
 at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
 ... 5 more
 Caused by: java.lang.IllegalStateException: queue is full
 at org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
 at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
 ... 6 more

  was:
1. Cluster info: 6 nodes, 30 Solr servers

1000 collections, 10 shards per collection, 3 replica per shard

Exception happened when restarting Solr cluster.

 

2. Exception happened when restarting Solr cluster. The question is NO 
exception hander is defined when this exception 
"java.lang.IllegalStateException: queue is full" is thrown when arriving at the 
threshold

STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to 
preRegister and never come up again.

 

3. Suggestions:

a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable?

b. IllegalStateException should be handled and retry logic should be added.

 

4. Detailed error is given as below.

2018-12-12 11:20:24,737 | ERROR | 
coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | Error 
waiting for SolrCore to be created | 
org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
 java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
Unable to create core [collection9_shard1_replica3]
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: org.apache.solr.common.SolrException: Unable to create core 
[collection9_shard1_replica3]
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
 at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
 ... 5 more
 Caused by: java.lang.IllegalStateException: queue is full
 at org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
 at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
 ... 6 more


> Solr replica remaining down status when hitting the maxQueueSize as 20000 
> after restart Solr servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13061
>                 URL: https://issues.apache.org/jira/browse/SOLR-13061
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.2.1
>         Environment: Cluster info: 6 nodes, 30 Solr servers
> 1000 collections, 10 shards per collection, 3 replica per shard
> Exception happened when restarting Solr cluster.
>            Reporter: Zhaohui Ma
>            Priority: Blocker
>              Labels: performance
>
> 1. Cluster info: 6 nodes, 30 Solr servers
> 1000 collections, 10 shards per collection, 3 replica per shard
> Exception happened when restarting Solr cluster.
>  
> 2. Exception happened when restarting Solr cluster. The question is NO 
> exception hander is defined when this exception 
> "java.lang.IllegalStateException: queue is full" is thrown when arriving at 
> the threshold
> STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to 
> preRegister and never come up again.
>  
> 3. Suggestions:
> a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable? Any plan or risk 
> to enlarge this queue size as 20,000 is too much small.
> b. IllegalStateException should be handled and retry logic should be added.
>  
> 4. Detailed error is given as below.
> 2018-12-12 11:20:24,737 | ERROR | 
> coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | 
> Error waiting for SolrCore to be created | 
> org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
>  java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core 
> [collection9_shard1_replica3]
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.solr.common.SolrException: Unable to create core 
> [collection9_shard1_replica3]
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
>  at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
>  ... 5 more
>  Caused by: java.lang.IllegalStateException: queue is full
>  at 
> org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
>  at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
>  at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
>  at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
>  ... 6 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to