Re: Leader election in Kube-land

Jason Gerlowski Fri, 29 Oct 2021 06:27:44 -0700

> Improved cluster stability.  Restarting the leader is far simpler than 
> electing a new leader, peer syncing, index finger printing etc


(I'll assume a single TLOG replica on its own pod as I think Joel
suggested in his latest reply.)

Restarts are definitely simpler than leader-election, but I'm not sure
they'd always be quicker.  In fact, I can imagine scenarios where
restarting the one TLOG would be significantly slower: a new Kube node
needs to pull the Solr image over a slow network, the cluster is near
capacity and the Kube scheduler can't find a place to run the
TLOG-hosting pod, etc.  I'm not sure which approach would be quicker
on average - would be really interesting to test, given its
implications for how quickly updates would get re-enabled.

Of course, speed is only one concern among many.  Single-TLOG sounds
like an awesome option for users that worry more about data loss than
indexing-uptime.  That's probably the biggest advantage of the scheme
you propose IMO.

Best,

Jason

On Fri, Oct 29, 2021 at 8:51 AM Joel Bernstein <[email protected]> wrote:
>
> Kube may have solutions to your questions. It's mainly about carefully 
> constructing collections. One approach would be to place each tlog leader in 
> it's own pod and using node anti-affinitity rules to spread them across 
> kubernetes nodes and availability zones. We're currently working on a Solr 
> collections operator which creates collections using the Solr operator to 
> allocate the Solr nodes. The collections operator is where all the 
> intelligence resides for creating collections that maximize resiliency on 
> kubernetes.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Oct 28, 2021 at 8:22 PM Ilan Ginzburg <[email protected]> wrote:
>>
>> The idea is tempting...
>> Limiting to one tlog replica per shard might not be sufficient though. What 
>> if a node has too many shard leaders and we want to rebalance these across 
>> the cluster to other nodes?
>> What if a node has some intrinsic issues (runs out of memory each time or 
>> unable to start due to too many replicas), we need a mechanism to transfer 
>> shard leadership.
>>
>> I've been considering a different approach (but haven't dug very deep into 
>> it yet): skip shard leader election and based on replica terms from ZK, pick 
>> one of the most up to date replicas and consider it the leader (i.e. send 
>> indexing there). Given two replicas of the same shard might then be indexing 
>> concurrently, we must make sure that if anything goes wrong (updates can't 
>> be propagated), one or both batches fail.
>>
>> Ilan
>>
>>
>>
>> On Thu, Oct 28, 2021 at 8:22 PM Joel Bernstein <[email protected]> wrote:
>>>
>>> As I get deeper into Solr on kube, I've begun to wonder if Solr leader 
>>> election on kube is an obsolete concept. Leader election was conceived when 
>>> hardware was not fungible. Now that hardware is fungible I wonder if it's 
>>> time to rethink the whole idea of leader election.
>>>
>>> Consider the following scenario:
>>>
>>> A collection where each shard has 1 tlog replica and N pull replicas. A 
>>> shard leader goes down, indexing fails on the shard for a period of time, 
>>> kube restarts the leader, indexing succeeds on the shard. Pull replicas 
>>> continue to accept queries the entire time.
>>>
>>> There are three main advantages of this kind of setup:
>>>
>>> 1) Potential for zero data loss. In this scenario indexing either succeeds 
>>> or it fails. We no longer have data loss that comes from a lack of a two 
>>> phase commit across a set of tlog or nrt replicas. Now there is only one 
>>> shard leader, which has a transaction redo log, and this is much, much 
>>> easier to achieve zero data loss.
>>>
>>> 2) Improved cluster stability.  Restarting the leader is far simpler than 
>>> electing a new leader, peer syncing, index finger printing etc... and would 
>>> eliminate a whole class of operational issues.
>>>
>>> 3) The phasing out of nrt, and maybe even leader election in the code base, 
>>> greatly decreases the amount of code complexity and allows committers to 
>>> harden the eventually consistent model.
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Leader election in Kube-land

Reply via email to