Kube may have solutions to your questions. It's mainly about carefully constructing collections. One approach would be to place each tlog leader in it's own pod and using node anti-affinitity rules to spread them across kubernetes nodes and availability zones. We're currently working on a Solr collections operator which creates collections using the Solr operator to allocate the Solr nodes. The collections operator is where all the intelligence resides for creating collections that maximize resiliency on kubernetes.
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Oct 28, 2021 at 8:22 PM Ilan Ginzburg <[email protected]> wrote: > The idea is tempting... > Limiting to one tlog replica per shard might not be sufficient though. > What if a node has too many shard leaders and we want to rebalance these > across the cluster to other nodes? > What if a node has some intrinsic issues (runs out of memory each time or > unable to start due to too many replicas), we need a mechanism to transfer > shard leadership. > > I've been considering a different approach (but haven't dug very deep into > it yet): skip shard leader election and based on replica terms from ZK, > pick one of the most up to date replicas and consider it the leader (i.e. > send indexing there). Given two replicas of the same shard might then be > indexing concurrently, we must make sure that if anything goes wrong > (updates can't be propagated), one or both batches fail. > > Ilan > > > > On Thu, Oct 28, 2021 at 8:22 PM Joel Bernstein <[email protected]> wrote: > >> As I get deeper into Solr on kube, I've begun to wonder if Solr leader >> election on kube is an obsolete concept. Leader election was conceived when >> hardware was not fungible. Now that hardware is fungible I wonder if it's >> time to rethink the whole idea of leader election. >> >> Consider the following scenario: >> >> A collection where each shard has 1 tlog replica and N pull replicas. A >> shard leader goes down, indexing fails on the shard for a period of time, >> kube restarts the leader, indexing succeeds on the shard. Pull replicas >> continue to accept queries the entire time. >> >> There are three main advantages of this kind of setup: >> >> 1) Potential for zero data loss. In this scenario indexing either >> succeeds or it fails. We no longer have data loss that comes from a lack of >> a two phase commit across a set of tlog or nrt replicas. Now there is only >> one shard leader, which has a transaction redo log, and this is much, much >> easier to achieve zero data loss. >> >> 2) Improved cluster stability. Restarting the leader is far simpler than >> electing a new leader, peer syncing, index finger printing etc... and would >> eliminate a whole class of operational issues. >> >> 3) The phasing out of nrt, and maybe even leader election in the code >> base, greatly decreases the amount of code complexity and allows committers >> to harden the eventually consistent model. >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >
