Tim was referring to code that addresses those issues from the ref branch. I’ve been trying to remember the items that Ilan has brought up, for reason I thought this was third one, but I can only come up with shard leader loss and overseer loss other than leader sync.
I also recalled I have slide tools for this type of thing, so a quick browse through leader election. A bit more irreverente because i can only skim the surface of the complexity involved and because there are no real small effort impactful improvements. https://www.solrdev.io/leader-election-adventure.html On Sat, Oct 2, 2021 at 7:51 AM David Smiley <[email protected]> wrote: > I just want to say that I appreciate the insights you shared over the last > couple days. By "copy paste", was Tim referring to copying your insights > and pasting them into the code? This is what I was thinking. Or at least > some way to make these insights more durable / findable. Could be a link > from the code into maybe a wiki page or something. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Fri, Oct 1, 2021 at 11:02 PM Mark Miller <[email protected]> wrote: > >> Tim hit me with the obvious question here. >> >> “I’m assuming there are reasons, but what about a little copy past on >> some of these issues that you mentioned.” >> >> I say the obvious question because I kind of flippantly jump through some >> lines of code and then say and then you just do a, b and c and that’s the >> ballgame. >> >> There are a lot of reasons I can’t cut and paste though. And I can open >> almost any class and annotate a similar set of issues. So without diving >> into all the reasons, I would have already if it was so simple. I can >> certainly help address some things, lean on existing code and efforts, but >> at the moment I’m in a position where the best I have is to work on things >> as needed by outside pressures, items or demands. >> >> If I see others improving or redoing any of this core cloud code though, >> I’d certainly lend a hand on those efforts. Outside of making changes based >> on external needs, I just got out from under the solo kamakize, and i cant >> dive back in without it being on contained items and goals that satisfies >> someone’s needs or joining an existing multi crew effort or goal. >> >> If I had to randomly pull threads, repeat efforts yet one more time, and >> funnel that work through a gauntlet of uninvolved, good intentioned >> developers, neither me nor anyone else would be pleased. >> >> Mark >> >> On Fri, Oct 1, 2021 at 2:17 PM Mark Miller <[email protected]> wrote: >> >>> That covers a lot of current silliness you will see, pretty simply as >>> most of it comes down remove silly stuff, but you can find some related >>> wildness in ZkController#register. >>> >>> // check replica's existence in clusterstate first >>> >>> zkStateReader.waitForState(collection, 100, TimeUnit.MILLISECONDS, >>> (collectionState) -> getReplicaOrNull(collectionState, shardId, >>> coreZkNodeName) != null); >>> >>> 100ms wait, no biggie, and at least it uses waitForState, but we should not >>> need to get our own clusterstate from zk so here care about waiting for >>> this here - if there is an item of data we need, it should have been passed >>> into the core create call. >>> >>> Next we get the shard terms object so we can later create our shard terms >>> entry (LIR). >>> >>> Slow and bug inducing complicated to have each replica do this here, >>> fighting each other to add an initial entry. You can create the initial >>> shard terms for a replica when you create or update the clusterstate (term >>> {replicaname=0}), and you can do it in >>> >>> a single zk call. >>> >>> >>> // in this case, we want to wait for the leader as long as the leader might >>> // wait for a vote, at least - but also long enough that a large cluster has >>> // time to get its act together >>> String leaderUrl = getLeader(cloudDesc, leaderVoteWait + 600000); >>> >>> Now we do getLeader, a polling operation that should not be, and wait >>> possibly forever for it. As I mention there should be little wait at most >>> in the notes on leader sync, there should be little wait here. It's also >>> >>> one of a variety of places that even if you remove the polling, sucks to >>> wait on. I'm a fan of thousands of cores per machine not being an issue. In >>> many of these cases, you can't achieve that and have 1000 threads hanging >>> out >>> >>> all over even if they are not blind polling. This is one of the simpler >>> cases where that can be addressed. I break this method into two and I >>> enhance zkstatereader waitforstate functionality. I allow you to pass a >>> runnable to execute >>> >>> when zkstatereader is notified and the given predicate matches. So no need >>> for 1000's or hundreds or dozens of slackers here. Do a couple base >>> register items, call wait for state with a runnable that calls the second >>> part of the logic >>> >>> when a leader comes into zkstatereader and go away. We can't eat up threads >>> like this in all these cases. >>> >>> Now you can also easily shutdown and reload cores and do various things >>> that are currently harassed by various waits like this slacking off in >>> these wait loops. >>> >>> >>> >>> The rest is just continuation of this game when it comes to leader >>> selection and finalization and collection creation and replica spin up. You >>> make zkstatereader actually efficient. You make multiple and lazy >>> collections work appropriately, >>> >>> and not super inefficient. >>> >>> You make leader election a sensible bit of code. As part of zkstatereader >>> sensibility you remove the need for a billion client based watches in zk >>> and in many cases the need for a thousand watcher implementations and >>> instances. >>> >>> You let the components dictate how often requests go to services and >>> coalesce dependent code requests instead of letting the dependents dictate >>> service request cadence and size, and you do a lot less sillines like >>> serialize large json >>> >>> structures for bit size data updates, and scaling to 10's of k and even >>> 100's of k replicas and collections is doable even >>> >>> on single machines and a handful of Solr instances, say nothing about >>> pulling in more hardware. Everything required is cheap cheap cheap. It's >>> the mountain of unrequired that is expensive expensive expensive. >>> >>> >>> On Fri, Oct 1, 2021 at 12:47 PM Mark Miller <[email protected]> >>> wrote: >>> >>>> Ignoring lots of polling, inefficiencies, early defensive raw sleeps, >>>> various races and bugs and a laundry list of items involved in making >>>> leader processes good enough to enter a collection creation contest, here >>>> is a more practical small set of notes off the top of my head on a quick >>>> inspection around what is currently just in your face non sensible. >>>> >>>> https://gist.github.com/markrmiller/233119ba84ce39d39960de0f35e79fc9 >>>> >>> >>> >>> -- >>> - Mark >>> >>> http://about.me/markrmiller >>> >> -- >> - Mark >> >> http://about.me/markrmiller >> > -- - Mark http://about.me/markrmiller
