> Improved cluster stability. Restarting the leader is far simpler than > electing a new leader, peer syncing, index finger printing etc
(I'll assume a single TLOG replica on its own pod as I think Joel suggested in his latest reply.) Restarts are definitely simpler than leader-election, but I'm not sure they'd always be quicker. In fact, I can imagine scenarios where restarting the one TLOG would be significantly slower: a new Kube node needs to pull the Solr image over a slow network, the cluster is near capacity and the Kube scheduler can't find a place to run the TLOG-hosting pod, etc. I'm not sure which approach would be quicker on average - would be really interesting to test, given its implications for how quickly updates would get re-enabled. Of course, speed is only one concern among many. Single-TLOG sounds like an awesome option for users that worry more about data loss than indexing-uptime. That's probably the biggest advantage of the scheme you propose IMO. Best, Jason On Fri, Oct 29, 2021 at 8:51 AM Joel Bernstein <[email protected]> wrote: > > Kube may have solutions to your questions. It's mainly about carefully > constructing collections. One approach would be to place each tlog leader in > it's own pod and using node anti-affinitity rules to spread them across > kubernetes nodes and availability zones. We're currently working on a Solr > collections operator which creates collections using the Solr operator to > allocate the Solr nodes. The collections operator is where all the > intelligence resides for creating collections that maximize resiliency on > kubernetes. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, Oct 28, 2021 at 8:22 PM Ilan Ginzburg <[email protected]> wrote: >> >> The idea is tempting... >> Limiting to one tlog replica per shard might not be sufficient though. What >> if a node has too many shard leaders and we want to rebalance these across >> the cluster to other nodes? >> What if a node has some intrinsic issues (runs out of memory each time or >> unable to start due to too many replicas), we need a mechanism to transfer >> shard leadership. >> >> I've been considering a different approach (but haven't dug very deep into >> it yet): skip shard leader election and based on replica terms from ZK, pick >> one of the most up to date replicas and consider it the leader (i.e. send >> indexing there). Given two replicas of the same shard might then be indexing >> concurrently, we must make sure that if anything goes wrong (updates can't >> be propagated), one or both batches fail. >> >> Ilan >> >> >> >> On Thu, Oct 28, 2021 at 8:22 PM Joel Bernstein <[email protected]> wrote: >>> >>> As I get deeper into Solr on kube, I've begun to wonder if Solr leader >>> election on kube is an obsolete concept. Leader election was conceived when >>> hardware was not fungible. Now that hardware is fungible I wonder if it's >>> time to rethink the whole idea of leader election. >>> >>> Consider the following scenario: >>> >>> A collection where each shard has 1 tlog replica and N pull replicas. A >>> shard leader goes down, indexing fails on the shard for a period of time, >>> kube restarts the leader, indexing succeeds on the shard. Pull replicas >>> continue to accept queries the entire time. >>> >>> There are three main advantages of this kind of setup: >>> >>> 1) Potential for zero data loss. In this scenario indexing either succeeds >>> or it fails. We no longer have data loss that comes from a lack of a two >>> phase commit across a set of tlog or nrt replicas. Now there is only one >>> shard leader, which has a transaction redo log, and this is much, much >>> easier to achieve zero data loss. >>> >>> 2) Improved cluster stability. Restarting the leader is far simpler than >>> electing a new leader, peer syncing, index finger printing etc... and would >>> eliminate a whole class of operational issues. >>> >>> 3) The phasing out of nrt, and maybe even leader election in the code base, >>> greatly decreases the amount of code complexity and allows committers to >>> harden the eventually consistent model. >>> >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
