Hi all, I have an 8 node SolrCloud 5.5 cluster with 11 collections,
most of them in a 1 shard x 8 replicas configuration. We have 5 ZK
nodes.

During the night, we attempted to reindex one of the larger
collections. We reindex by pushing json docs to the update handler
from a number of processes. It seemed this overwhelmed the servers,
and caused all of the collections to fail and end up in either a down
or a recovering state, often with no leader.

Restarting and rebooting the servers brought a lot of the collections
back online, but we are left with a few collections for which all the
nodes hosting those replicas are up, but the replica reports as either
"active" or "down", and with no leader.

Trying to force a leader election has no effect, it keeps choosing a
leader that is in "down" state. Removing all the nodes that are in
"down" state and forcing a leader election also has no effect.


Any ideas? The only viable option I see is to create a new collection,
index it and then remove the old collection and alias it in.

Cheers

Tom

Reply via email to