Hi, I am indexing data into a 64 shard collection created in a SOLR 4.10.3,CDH cluster running over HDFS and having 19 nodes. The indexing runs very well for the intial few hours(5-6) post which all the different nodes of the cluster start showing health issues(varying randomly across the nodes) and the indexing speed also reduces a lot. I have used the SOLR tuning guidelines specified in - https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_tuning_solr.html#csug_topic_10
and tried but it did not work out. I observed that decreasing the "solr.hdfs.blockcache.slab.count" to a very low value(32) improves indexing a lot but only for the initial few hours. Some errors that I get on the server side logs are - org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss org.apache.solr.core.SolrCore: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. org.apache.solr.update.processor.DistributedUpdateProcessor: ClusterState says we are the leader, but locally we don't think so The cluster never self-recovers post encountering the errors I mentioned above. Restarting the cluster does solve the problem though which again starts occurring after a few hours. I would need some suggestions/guidelines/helpful links on what are the parameters that I should consider and their recommended values to be used to ensure a stable and smooth indexing. -- Sathyam Doraswamy