Hi all, I have an 8 node SolrCloud 5.5 cluster with 11 collections, most of them in a 1 shard x 8 replicas configuration. We have 5 ZK nodes.
During the night, we attempted to reindex one of the larger collections. We reindex by pushing json docs to the update handler from a number of processes. It seemed this overwhelmed the servers, and caused all of the collections to fail and end up in either a down or a recovering state, often with no leader. Restarting and rebooting the servers brought a lot of the collections back online, but we are left with a few collections for which all the nodes hosting those replicas are up, but the replica reports as either "active" or "down", and with no leader. Trying to force a leader election has no effect, it keeps choosing a leader that is in "down" state. Removing all the nodes that are in "down" state and forcing a leader election also has no effect. Any ideas? The only viable option I see is to create a new collection, index it and then remove the old collection and alias it in. Cheers Tom