Re: ZkCmdExecutor

Ilan Ginzburg Tue, 05 Oct 2021 00:56:24 -0700

Do typical setups (users) of SolrCloud not care about no data loss in the
cluster, or are clusters maintained stable enough that these issues do not
happen? I feel moving to a cloud infra drives more instability and suspect
these issues becoming more prevalent/problematic then.


Indeed Terms (LIR) and shard leader election serve the same purpose. Maybe
one could go... I'd like to have a cluster where some replicas are not
loaded in memory (transient cores) and do not even participate in all the
ZK activity until they're needed. This would open the door to a
ridiculously high number of replicas (limited by disk size, not by memory,
CPU, connections, threads or anything else).
This would serve well a search hosting use case where most tenants are not
active at any given time, but there are many of them.

I don't know if it's a realistic evolution of SolrCloud or should be
considered science fiction at this stage.

Ilan

On Tue, Oct 5, 2021 at 7:33 AM Mark Miller <[email protected]> wrote:

> Well, the replicas are still waiting for the leader, so not no wait, you
> just don’t have leaders waiting for full shards that lessens the problem. A
> leader should have no wait and release any waiting replicas.
>
> That core sorter should be looking at LIR to start the leader capable
> replicas first.
>
> Mark
>
> On Tue, Oct 5, 2021 at 12:10 AM Mark Miller <[email protected]> wrote:
>
>>
>>
>> On Mon, Oct 4, 2021 at 5:24 AM Ilan Ginzburg <[email protected]> wrote:
>>
>>> Thanks Mark for your write ups! This is an area of SolrCloud I'm
>>> currently actively exploring at work (might publish my notes as well at
>>> some point).
>>>
>>> I think terms value (fitness to become leader) should participate in the
>>> election node ordering, as well as a terms goal (based on highest known
>>> term for the shard), and clear options of stalled election vs. data loss
>>> (and if data loss is not acceptable, an external process automated or human
>>> likely has to intervene to unblock the stalled election).
>>>
>>
>> There should only be a stalled election if no one can become leader for
>> some odd reason - agreed it would be good to be able to detect that vs
>> cycling forever.
>>
>> I’ve also always thought that should probably be a configuration. If you
>> know a replica is absent with more data you just bail (currently it will
>> wait and then continue) or an operator could configure to continue with the
>> data you have regardless.
>>
>> I used to think about those things, but it’s a boring area to care about
>> or work on given no one else does.
>>
>>
>>> Even if all updates hit multiple replicas, nothing guarantees that any
>>> of these copies is present when another replica (without the update)
>>> starts. If we don't want to wait at startup for other replicas to join an
>>> election (this can't scale even though CoreSorter does its best... but
>>> is the most convoluted Comparator I've ever seen) we might need the
>>> notion of "incomplete leader", i.e. a replica that is the current elected
>>> leader but that does not have all data (at some later point we might decide
>>> to accept the loss and consider it's the leader, or when a better
>>> positioned replica joins, have it become leader). This will require quite
>>> some assumptions revisiting, so likely should be associated with a
>>> thorough clean up (and a move to Curator election?).
>>>
>>
>> A new replica that comes up should fit into the above logic - it will
>> have a term of 0 and other replicas will have higher terms and you will
>> know from zk - so either you fail shard startup or like today, the other
>> replicas are not coming and you continue on.
>>
>> I don’t think that core sorter does the best sort based on the last time
>> I looked at it.
>>
>> It used to actually matter much more though - because the shard could not
>> start until all the replicas were up, so with many replicas and shards, and
>> collections, depending on order you could easily have to start a huge
>> number of cores to complete a shard before it got moving.
>>
>> But that issue should not be nearly the issue that it was. That’s again
>> from a world with no LIR info. There should be no reason to wait for all
>> replicas today, no reason to have them all involved in a sync. Any replica
>> with the highest LIR term can become leader, so sorting to complete shards
>> might be nice, but it should not be needed like it was. (It still is
>> needed, but again because we still wait when you don’t need to).
>>
>> Technically, you could do a much simpler leader election. With an
>> overseer, it could just pick the leader. With or without, a replica that
>> see it has the highest term could just try and create the leader node -
>> first one wins.
>>
>> The current leader election is a recipe zk promotes to avoid a thundering
>> herd affect - you can have tons of participants and it’s an efficient flow
>> vs 100 participants fighting to see who creates a zk node every new
>> election.
>>
>> But generally we have 3 replicas. Some outlier users might use more, but
>> even still it’s not going to be that many.
>>
>> Mark
>>
>>
>>> Ilan
>>>
>>>
>>>
>>> On Sun, Oct 3, 2021 at 4:27 AM Mark Miller <[email protected]>
>>> wrote:
>>>
>>>> I filed
>>>> https://issues.apache.org/jira/browse/SOLR-15672 Leader Election is
>>>> flawed - for future reference if anyone looks at tackling leader election
>>>> issues. I’ll drop a couple notes and random suggestions there
>>>>
>>>> Mark
>>>>
>>>> On Sat, Oct 2, 2021 at 12:47 PM Mark Miller <[email protected]>
>>>> wrote:
>>>>
>>>>> At some point digging through some of this stuff, I often start to
>>>>> think, I wonder how good our tests are at catching certain categories of
>>>>> problems. Various groups of code branches and behaviors.
>>>>>
>>>>> I do notice that as I get the test flying, they do start to pick up a
>>>>> lot more issues. A lot more bugs and bad behavior. And as they start to
>>>>> near max out, I start feeling a little better about a lot of it. But then
>>>>> I’m looking at things outside of tests still as well. Using my own tools
>>>>> and setups, using stuff from others. Being cruel in my expectations. And 
>>>>> by
>>>>> then I’ve come a long way, but I can still find problems. Run into bad
>>>>> situations. If I push, and when I make it so can push harder, i push even
>>>>> harder. And I want the damn thing solid. Why come all this way if I can’t
>>>>> have really and truly solid. And that’s when I reach for collection
>>>>> creation a mixed with cluster restarts.  How about I shove 100000 
>>>>> SolrCores
>>>>> 10000 collections right down its mouth on a handful of instances on a
>>>>> single machine in like a minute timeframe. How about 30 seconds. How about
>>>>> more collections. How about lower time frames. Vary things around. Let’s
>>>>> just swamp it and demand the setup eats it in silly time frames and stands
>>>>> up at the end correct and happy.  And then I start to get to the bottom of
>>>>> the barrel on what’s subverting my solidness. But as I’ve always said, 
>>>>> more
>>>>> and more targeted for tests along with simpler and more understandable
>>>>> implementations will also cover a lot more ground. I certainly have pushed
>>>>> on simpler implementations. I’ve never gotten to the point where I have 
>>>>> the
>>>>> energy and time to just push on more, better and more targeted tests, more
>>>>> unit tests, more mockito, more awaitability as Tims suggested, etc.
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://about.me/markrmiller
>>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://about.me/markrmiller
>>>>
>>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
> --
> - Mark
>
> http://about.me/markrmiller
>

Re: ZkCmdExecutor

Reply via email to