Re: ZkCmdExecutor

Mark Miller Tue, 05 Oct 2021 01:33:06 -0700

I think the problem has been a mix of cases. For many, search is not the
sytem of record, so data loss is not great, but not the end of the world.
For others, the number of issues you could run into caused them to take
different approaches. If you can keep the ZooKeeper connection stable
enough, you can avoid a good chunk off issues, so you might limit core
sizes, throttle updates, try and contain everything relative to the
hardware. Some do not likely notice data loss that happens. Some will have
extensive monitoring and paging and intervene early, with access to either
backups or in rarer cases, a cross data center pair cluster.


Do to the difficulty of upgrades, perhaps more difficult as you go back, I
don't know, dev tends to be way ahead of what people are on. Users where on
4x for an extremely long time when dev had moved versions past the 4x
branch. A large number are still on 5x now. Large companies put a lot of
internal effort into different mitigations or workarounds in these cases.
Even if they are more up to date, they are building a project now, doing
dev and waiting for a release is usually not really considered an option.

Other companies that might have had more alignment with that type of work
have  their own unuique histories and compouding reasons why that was never
really invested in.

In the early days, there were so many more issues you faced, many are just
willing to stomach a lot of pain for the search side, which is actually
pretty decent. It's free, it's Apache, it's in a niche without a lot of
competition. The licenese and open source model trump a technical desicion
many times and then it's up to the developers to figure it out.

LIR (I know it's a weird name for it now, it made a little more sense with
the impl it replaced) is pretty solid though. There are a few issues, I'll
see if I can dig a couple up.

Yeah, there is probably no reason transient cores could not work with
cloud. I'm sure they would be a trip today. I have little trust in them in
general today. That's part of why  I pushed to make SolrCores so efficient
as well, I didn't even want to touch transient cores given the state they
looked in. Compounding reasons too, SolrCores are just heavy to load, heavy
to reload. Reload a collection with a lot of SolrCores on a machine and
take a look at those system stats. But making them really cheap is beyond
what anyone would invest in such a project and doesn't deal with winding
down large data structures like caches when not being used anyway. It just
aligned with my want to finally see thousands of cores per container work,
all live, and I've always had a distate for transient cores given I think
they likely have a list of problems and have never had any first class
consideration, dev, or tests. They would also be nicer with super fast
SolrCore loads is another reason I guess I wasn't too thrilled with using
them as a solution without makeing SolrCores cheap.

Anyway, I don't see why transient cores couldnt work with Cloud, and
depending on how many you might have to spin up at once, probably just fine
for most use cases given you will have cold caches anyway.

- Mark

On Tue, Oct 5, 2021 at 2:56 AM Ilan Ginzburg <[email protected]> wrote:

> Do typical setups (users) of SolrCloud not care about no data loss in the
> cluster, or are clusters maintained stable enough that these issues do not
> happen? I feel moving to a cloud infra drives more instability and suspect
> these issues becoming more prevalent/problematic then.
>
> Indeed Terms (LIR) and shard leader election serve the same purpose. Maybe
> one could go... I'd like to have a cluster where some replicas are not
> loaded in memory (transient cores) and do not even participate in all the
> ZK activity until they're needed. This would open the door to a
> ridiculously high number of replicas (limited by disk size, not by memory,
> CPU, connections, threads or anything else).
> This would serve well a search hosting use case where most tenants are not
> active at any given time, but there are many of them.
>
> I don't know if it's a realistic evolution of SolrCloud or should be
> considered science fiction at this stage.
>
> Ilan
>
> On Tue, Oct 5, 2021 at 7:33 AM Mark Miller <[email protected]> wrote:
>
>> Well, the replicas are still waiting for the leader, so not no wait, you
>> just don’t have leaders waiting for full shards that lessens the problem. A
>> leader should have no wait and release any waiting replicas.
>>
>> That core sorter should be looking at LIR to start the leader capable
>> replicas first.
>>
>> Mark
>>
>> On Tue, Oct 5, 2021 at 12:10 AM Mark Miller <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Mon, Oct 4, 2021 at 5:24 AM Ilan Ginzburg <[email protected]> wrote:
>>>
>>>> Thanks Mark for your write ups! This is an area of SolrCloud I'm
>>>> currently actively exploring at work (might publish my notes as well at
>>>> some point).
>>>>
>>>> I think terms value (fitness to become leader) should participate in
>>>> the election node ordering, as well as a terms goal (based on highest known
>>>> term for the shard), and clear options of stalled election vs. data loss
>>>> (and if data loss is not acceptable, an external process automated or human
>>>> likely has to intervene to unblock the stalled election).
>>>>
>>>
>>> There should only be a stalled election if no one can become leader for
>>> some odd reason - agreed it would be good to be able to detect that vs
>>> cycling forever.
>>>
>>> I’ve also always thought that should probably be a configuration. If you
>>> know a replica is absent with more data you just bail (currently it will
>>> wait and then continue) or an operator could configure to continue with the
>>> data you have regardless.
>>>
>>> I used to think about those things, but it’s a boring area to care about
>>> or work on given no one else does.
>>>
>>>
>>>> Even if all updates hit multiple replicas, nothing guarantees that any
>>>> of these copies is present when another replica (without the update)
>>>> starts. If we don't want to wait at startup for other replicas to join an
>>>> election (this can't scale even though CoreSorter does its best... but
>>>> is the most convoluted Comparator I've ever seen) we might need the
>>>> notion of "incomplete leader", i.e. a replica that is the current elected
>>>> leader but that does not have all data (at some later point we might decide
>>>> to accept the loss and consider it's the leader, or when a better
>>>> positioned replica joins, have it become leader). This will require quite
>>>> some assumptions revisiting, so likely should be associated with a
>>>> thorough clean up (and a move to Curator election?).
>>>>
>>>
>>> A new replica that comes up should fit into the above logic - it will
>>> have a term of 0 and other replicas will have higher terms and you will
>>> know from zk - so either you fail shard startup or like today, the other
>>> replicas are not coming and you continue on.
>>>
>>> I don’t think that core sorter does the best sort based on the last time
>>> I looked at it.
>>>
>>> It used to actually matter much more though - because the shard could
>>> not start until all the replicas were up, so with many replicas and shards,
>>> and collections, depending on order you could easily have to start a huge
>>> number of cores to complete a shard before it got moving.
>>>
>>> But that issue should not be nearly the issue that it was. That’s again
>>> from a world with no LIR info. There should be no reason to wait for all
>>> replicas today, no reason to have them all involved in a sync. Any replica
>>> with the highest LIR term can become leader, so sorting to complete shards
>>> might be nice, but it should not be needed like it was. (It still is
>>> needed, but again because we still wait when you don’t need to).
>>>
>>> Technically, you could do a much simpler leader election. With an
>>> overseer, it could just pick the leader. With or without, a replica that
>>> see it has the highest term could just try and create the leader node -
>>> first one wins.
>>>
>>> The current leader election is a recipe zk promotes to avoid a
>>> thundering herd affect - you can have tons of participants and it’s an
>>> efficient flow vs 100 participants fighting to see who creates a zk node
>>> every new election.
>>>
>>> But generally we have 3 replicas. Some outlier users might use more, but
>>> even still it’s not going to be that many.
>>>
>>> Mark
>>>
>>>
>>>> Ilan
>>>>
>>>>
>>>>
>>>> On Sun, Oct 3, 2021 at 4:27 AM Mark Miller <[email protected]>
>>>> wrote:
>>>>
>>>>> I filed
>>>>> https://issues.apache.org/jira/browse/SOLR-15672 Leader Election is
>>>>> flawed - for future reference if anyone looks at tackling leader election
>>>>> issues. I’ll drop a couple notes and random suggestions there
>>>>>
>>>>> Mark
>>>>>
>>>>> On Sat, Oct 2, 2021 at 12:47 PM Mark Miller <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> At some point digging through some of this stuff, I often start to
>>>>>> think, I wonder how good our tests are at catching certain categories of
>>>>>> problems. Various groups of code branches and behaviors.
>>>>>>
>>>>>> I do notice that as I get the test flying, they do start to pick up a
>>>>>> lot more issues. A lot more bugs and bad behavior. And as they start to
>>>>>> near max out, I start feeling a little better about a lot of it. But then
>>>>>> I’m looking at things outside of tests still as well. Using my own tools
>>>>>> and setups, using stuff from others. Being cruel in my expectations. And 
>>>>>> by
>>>>>> then I’ve come a long way, but I can still find problems. Run into bad
>>>>>> situations. If I push, and when I make it so can push harder, i push even
>>>>>> harder. And I want the damn thing solid. Why come all this way if I can’t
>>>>>> have really and truly solid. And that’s when I reach for collection
>>>>>> creation a mixed with cluster restarts.  How about I shove 100000 
>>>>>> SolrCores
>>>>>> 10000 collections right down its mouth on a handful of instances on a
>>>>>> single machine in like a minute timeframe. How about 30 seconds. How 
>>>>>> about
>>>>>> more collections. How about lower time frames. Vary things around. Let’s
>>>>>> just swamp it and demand the setup eats it in silly time frames and 
>>>>>> stands
>>>>>> up at the end correct and happy.  And then I start to get to the bottom 
>>>>>> of
>>>>>> the barrel on what’s subverting my solidness. But as I’ve always said, 
>>>>>> more
>>>>>> and more targeted for tests along with simpler and more understandable
>>>>>> implementations will also cover a lot more ground. I certainly have 
>>>>>> pushed
>>>>>> on simpler implementations. I’ve never gotten to the point where I have 
>>>>>> the
>>>>>> energy and time to just push on more, better and more targeted tests, 
>>>>>> more
>>>>>> unit tests, more mockito, more awaitability as Tims suggested, etc.
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://about.me/markrmiller
>>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://about.me/markrmiller
>>>>>
>>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
> --
- Mark

http://about.me/markrmiller

Re: ZkCmdExecutor

Reply via email to