Well, the replicas are still waiting for the leader, so not no wait, you
just don’t have leaders waiting for full shards that lessens the problem. A
leader should have no wait and release any waiting replicas.

That core sorter should be looking at LIR to start the leader capable
replicas first.

Mark

On Tue, Oct 5, 2021 at 12:10 AM Mark Miller <[email protected]> wrote:

>
>
> On Mon, Oct 4, 2021 at 5:24 AM Ilan Ginzburg <[email protected]> wrote:
>
>> Thanks Mark for your write ups! This is an area of SolrCloud I'm
>> currently actively exploring at work (might publish my notes as well at
>> some point).
>>
>> I think terms value (fitness to become leader) should participate in the
>> election node ordering, as well as a terms goal (based on highest known
>> term for the shard), and clear options of stalled election vs. data loss
>> (and if data loss is not acceptable, an external process automated or human
>> likely has to intervene to unblock the stalled election).
>>
>
> There should only be a stalled election if no one can become leader for
> some odd reason - agreed it would be good to be able to detect that vs
> cycling forever.
>
> I’ve also always thought that should probably be a configuration. If you
> know a replica is absent with more data you just bail (currently it will
> wait and then continue) or an operator could configure to continue with the
> data you have regardless.
>
> I used to think about those things, but it’s a boring area to care about
> or work on given no one else does.
>
>
>> Even if all updates hit multiple replicas, nothing guarantees that any of
>> these copies is present when another replica (without the update) starts.
>> If we don't want to wait at startup for other replicas to join an election
>> (this can't scale even though CoreSorter does its best... but is the
>> most convoluted Comparator I've ever seen) we might need the notion of
>> "incomplete leader", i.e. a replica that is the current elected leader but
>> that does not have all data (at some later point we might decide to accept
>> the loss and consider it's the leader, or when a better positioned replica
>> joins, have it become leader). This will require quite some assumptions
>> revisiting, so likely should be associated with a thorough clean up (and a
>> move to Curator election?).
>>
>
> A new replica that comes up should fit into the above logic - it will have
> a term of 0 and other replicas will have higher terms and you will know
> from zk - so either you fail shard startup or like today, the other
> replicas are not coming and you continue on.
>
> I don’t think that core sorter does the best sort based on the last time I
> looked at it.
>
> It used to actually matter much more though - because the shard could not
> start until all the replicas were up, so with many replicas and shards, and
> collections, depending on order you could easily have to start a huge
> number of cores to complete a shard before it got moving.
>
> But that issue should not be nearly the issue that it was. That’s again
> from a world with no LIR info. There should be no reason to wait for all
> replicas today, no reason to have them all involved in a sync. Any replica
> with the highest LIR term can become leader, so sorting to complete shards
> might be nice, but it should not be needed like it was. (It still is
> needed, but again because we still wait when you don’t need to).
>
> Technically, you could do a much simpler leader election. With an
> overseer, it could just pick the leader. With or without, a replica that
> see it has the highest term could just try and create the leader node -
> first one wins.
>
> The current leader election is a recipe zk promotes to avoid a thundering
> herd affect - you can have tons of participants and it’s an efficient flow
> vs 100 participants fighting to see who creates a zk node every new
> election.
>
> But generally we have 3 replicas. Some outlier users might use more, but
> even still it’s not going to be that many.
>
> Mark
>
>
>> Ilan
>>
>>
>>
>> On Sun, Oct 3, 2021 at 4:27 AM Mark Miller <[email protected]> wrote:
>>
>>> I filed
>>> https://issues.apache.org/jira/browse/SOLR-15672 Leader Election is
>>> flawed - for future reference if anyone looks at tackling leader election
>>> issues. I’ll drop a couple notes and random suggestions there
>>>
>>> Mark
>>>
>>> On Sat, Oct 2, 2021 at 12:47 PM Mark Miller <[email protected]>
>>> wrote:
>>>
>>>> At some point digging through some of this stuff, I often start to
>>>> think, I wonder how good our tests are at catching certain categories of
>>>> problems. Various groups of code branches and behaviors.
>>>>
>>>> I do notice that as I get the test flying, they do start to pick up a
>>>> lot more issues. A lot more bugs and bad behavior. And as they start to
>>>> near max out, I start feeling a little better about a lot of it. But then
>>>> I’m looking at things outside of tests still as well. Using my own tools
>>>> and setups, using stuff from others. Being cruel in my expectations. And by
>>>> then I’ve come a long way, but I can still find problems. Run into bad
>>>> situations. If I push, and when I make it so can push harder, i push even
>>>> harder. And I want the damn thing solid. Why come all this way if I can’t
>>>> have really and truly solid. And that’s when I reach for collection
>>>> creation a mixed with cluster restarts.  How about I shove 100000 SolrCores
>>>> 10000 collections right down its mouth on a handful of instances on a
>>>> single machine in like a minute timeframe. How about 30 seconds. How about
>>>> more collections. How about lower time frames. Vary things around. Let’s
>>>> just swamp it and demand the setup eats it in silly time frames and stands
>>>> up at the end correct and happy.  And then I start to get to the bottom of
>>>> the barrel on what’s subverting my solidness. But as I’ve always said, more
>>>> and more targeted for tests along with simpler and more understandable
>>>> implementations will also cover a lot more ground. I certainly have pushed
>>>> on simpler implementations. I’ve never gotten to the point where I have the
>>>> energy and time to just push on more, better and more targeted tests, more
>>>> unit tests, more mockito, more awaitability as Tims suggested, etc.
>>>> --
>>>> - Mark
>>>>
>>>> http://about.me/markrmiller
>>>>
>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>> --
> - Mark
>
> http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller

Reply via email to