Maxim Novikov created SOLR-5961:
-----------------------------------

             Summary: Solr gets crazy on /overseer/queue state change
                 Key: SOLR-5961
                 URL: https://issues.apache.org/jira/browse/SOLR-5961
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
    Affects Versions: 4.7.1
         Environment: CentOS, 1 shard - 3 replicas, ZK cluster with 3 nodes 
(separate machines)
            Reporter: Maxim Novikov
            Priority: Critical


No idea how to reproduce it, but sometimes Solr stars littering the log with 
the following messages:

419158 [localhost-startStop-1-EventThread] INFO  
org.apache.solr.cloud.DistributedQueue  ? LatchChildWatcher fired on path: 
/overseer/queue state: SyncConnected type NodeChildrenChanged

419190 [Thread-3] INFO  org.apache.solr.cloud.Overseer  ? Update state 
numShards=1 message={
  "operation":"state",
  "state":"recovering",
  "base_url":"http://${IP_ADDRESS}/solr";,
  "core":"${CORE_NAME}",
  "roles":null,
  "node_name":"${NODE_NAME}_solr",
  "shard":"shard1",
  "collection":"${COLLECTION_NAME}",
  "numShards":"1",
  "core_node_name":"core_node2"}

It continues spamming these messages with no delay and the restarting of all 
the nodes does not help. I have even tried to stop all the nodes in the cluster 
first, but then when I start one, the behavior doesn't change, it gets crazy 
nuts with this " /overseer/queue state" again.

PS The only way to handle this was to stop everything, manually clean up all 
the data in ZooKeeper related to Solr, and then rebuild everything from 
scratch. As you should understand, it is kinda unbearable in the production 
environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to