Tim was referring to code that addresses those issues from the ref branch.

I’ve been trying to remember the items that Ilan has brought up, for reason
I thought this was third one, but I can only come up with shard leader loss
and overseer loss other than leader sync.

I also recalled I have slide tools for this type of thing, so a quick
browse through leader election. A bit more irreverente because i can only
skim the surface of the complexity involved and because there are no real
small effort impactful improvements.

https://www.solrdev.io/leader-election-adventure.html


On Sat, Oct 2, 2021 at 7:51 AM David Smiley <[email protected]> wrote:

> I just want to say that I appreciate the insights you shared over the last
> couple days.  By "copy paste", was Tim referring to copying your insights
> and pasting them into the code?  This is what I was thinking.  Or at least
> some way to make these insights more durable / findable.  Could be a link
> from the code into maybe a wiki page or something.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Oct 1, 2021 at 11:02 PM Mark Miller <[email protected]> wrote:
>
>> Tim hit me with the obvious question here.
>>
>> “I’m assuming there are reasons, but what about a little copy past on
>> some of these issues that you mentioned.”
>>
>> I say the obvious question because I kind of flippantly jump through some
>> lines of code and then say and then you just do a, b and c and that’s the
>> ballgame.
>>
>> There are a lot of reasons I can’t cut and paste though. And I can open
>> almost any class and annotate a similar set of issues. So without diving
>> into all the reasons, I would have already if it was so simple. I can
>> certainly help address some things, lean on existing code and efforts, but
>> at the moment I’m in a position where the best I have is to work on things
>> as needed by outside pressures,  items or demands.
>>
>> If I see others improving or redoing any of this core cloud code though,
>> I’d certainly lend a hand on those efforts. Outside of making changes based
>> on external needs, I just got out from under the solo kamakize, and i cant
>> dive back in without it being on contained items and goals that satisfies
>> someone’s needs or joining an existing multi crew effort or goal.
>>
>> If I had to randomly pull threads, repeat efforts yet one more time, and
>> funnel that work through a gauntlet of uninvolved, good intentioned
>> developers, neither me nor anyone else would be pleased.
>>
>> Mark
>>
>> On Fri, Oct 1, 2021 at 2:17 PM Mark Miller <[email protected]> wrote:
>>
>>> That covers a lot of current silliness you will see, pretty simply as
>>> most of it comes down remove silly stuff, but you can find some related
>>> wildness in ZkController#register.
>>>
>>> // check replica's existence in clusterstate first
>>>
>>> zkStateReader.waitForState(collection, 100, TimeUnit.MILLISECONDS,
>>>     (collectionState) -> getReplicaOrNull(collectionState, shardId, 
>>> coreZkNodeName) != null);
>>>
>>> 100ms wait, no biggie, and at least it uses waitForState, but we should not 
>>> need to get our own clusterstate from zk so here care about waiting for 
>>> this here - if there is an item of data we need, it should have been passed 
>>> into the core create call.
>>>
>>> Next we get the shard terms object so we can later create our shard terms 
>>> entry (LIR).
>>>
>>> Slow and bug inducing complicated to have each replica do this here, 
>>> fighting each other to add an initial entry. You can create the initial 
>>> shard terms for a replica when you create or update the clusterstate (term 
>>> {replicaname=0}), and you can do it in
>>>
>>> a single zk call.
>>>
>>>
>>> // in this case, we want to wait for the leader as long as the leader might
>>> // wait for a vote, at least - but also long enough that a large cluster has
>>> // time to get its act together
>>> String leaderUrl = getLeader(cloudDesc, leaderVoteWait + 600000);
>>>
>>> Now we do getLeader, a polling operation that should not be, and wait 
>>> possibly forever for it. As I mention there should be little wait at most 
>>> in the notes on leader sync, there should be little wait here. It's also
>>>
>>> one of a variety of places that even if you remove the polling, sucks to 
>>> wait on. I'm a fan of thousands of cores per machine not being an issue. In 
>>> many of these cases, you can't achieve that and have 1000 threads hanging 
>>> out
>>>
>>> all over even if they are not blind polling. This is one of the simpler 
>>> cases where that can be addressed. I break this method into two and I 
>>> enhance zkstatereader waitforstate functionality. I allow you to pass a 
>>> runnable to execute
>>>
>>> when zkstatereader is notified and the given predicate matches. So no need 
>>> for 1000's or hundreds or dozens of slackers here. Do a couple base 
>>> register items, call wait for state with a runnable that calls the second 
>>> part of the logic
>>>
>>> when a leader comes into zkstatereader and go away. We can't eat up threads 
>>> like this in all these cases.
>>>
>>> Now you can also easily shutdown and reload cores and do various things 
>>> that are currently harassed by various waits like this slacking off in 
>>> these wait loops.
>>>
>>>
>>>
>>> The rest is just continuation of this game when it comes to leader 
>>> selection and finalization and collection creation and replica spin up. You 
>>> make zkstatereader actually efficient. You make multiple and lazy 
>>> collections work appropriately,
>>>
>>> and not super inefficient.
>>>
>>> You make leader election a sensible bit of code. As part of zkstatereader 
>>> sensibility you remove the need for a billion client based watches in zk 
>>> and in many cases the need for a thousand watcher implementations and 
>>> instances.
>>>
>>> You let the components dictate how often requests go to services and 
>>> coalesce dependent code requests instead of letting the dependents dictate 
>>> service request cadence and size, and you do a lot less sillines like 
>>> serialize large json
>>>
>>> structures for bit size data updates, and scaling to 10's of k and even 
>>> 100's of k replicas and collections is doable even
>>>
>>> on single machines and a handful of Solr instances, say nothing about 
>>> pulling in more hardware. Everything required is cheap cheap cheap. It's 
>>> the mountain of unrequired that is expensive expensive expensive.
>>>
>>>
>>> On Fri, Oct 1, 2021 at 12:47 PM Mark Miller <[email protected]>
>>> wrote:
>>>
>>>> Ignoring lots of polling, inefficiencies, early defensive raw sleeps,
>>>> various races and bugs and a laundry list of items involved in making
>>>> leader processes good enough to enter a collection creation contest, here
>>>> is a more practical small set of notes off the top of my head on a quick
>>>> inspection around what is currently just in your face non sensible.
>>>>
>>>> https://gist.github.com/markrmiller/233119ba84ce39d39960de0f35e79fc9
>>>>
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
> --
- Mark

http://about.me/markrmiller

Reply via email to