Maxim Novikov created SOLR-5961:
-----------------------------------
Summary: Solr gets crazy on /overseer/queue state change
Key: SOLR-5961
URL: https://issues.apache.org/jira/browse/SOLR-5961
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 4.7.1
Environment: CentOS, 1 shard - 3 replicas, ZK cluster with 3 nodes
(separate machines)
Reporter: Maxim Novikov
Priority: Critical
No idea how to reproduce it, but sometimes Solr stars littering the log with
the following messages:
419158 [localhost-startStop-1-EventThread] INFO
org.apache.solr.cloud.DistributedQueue ? LatchChildWatcher fired on path:
/overseer/queue state: SyncConnected type NodeChildrenChanged
419190 [Thread-3] INFO org.apache.solr.cloud.Overseer ? Update state
numShards=1 message={
"operation":"state",
"state":"recovering",
"base_url":"http://${IP_ADDRESS}/solr",
"core":"${CORE_NAME}",
"roles":null,
"node_name":"${NODE_NAME}_solr",
"shard":"shard1",
"collection":"${COLLECTION_NAME}",
"numShards":"1",
"core_node_name":"core_node2"}
It continues spamming these messages with no delay and the restarting of all
the nodes does not help. I have even tried to stop all the nodes in the cluster
first, but then when I start one, the behavior doesn't change, it gets crazy
nuts with this " /overseer/queue state" again.
PS The only way to handle this was to stop everything, manually clean up all
the data in ZooKeeper related to Solr, and then rebuild everything from
scratch. As you should understand, it is kinda unbearable in the production
environment.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]