On Thu, Sep 30, 2021 at 3:31 AM Ilan Ginzburg <[email protected]> wrote:
> Independent of how interactions with ZK are implemented (direct or via > Curator), we should first clean up what these interactions do or expect. > > Take shard leader elector. First a replica is elected, then we check if it > is fit for the job, run another election if not, look at other replicas > (hopefully) participating in the election, wait a bit more (total wait can > be 6 minutes), then might decide that an unfit leader is still fit… > Personally, I looked at that and saw a whole different set of problems that had to be solved. No one around on the same page as me there though, and with everyone else interested in sitting down and coming up with new designs, I tend to cut out fast when I don’t feel something is going somewhere. That was a different day though, different people different agenda. I will say, I have code (nothing even close to a patch or relevant practically here, but I have code that follows the white board of that design that is sub second if there is no data to be synced, and very damn fast even if there is. The kind of talk that tends to be taken as I’m promoting or defending some design, but I’m pretty design agnostic, unless it somehow makes things impossible. When I say the same design, that doesn’t mean it does all the same steps. Just that it follows the same design that Yonik drove the white board of, I drove the broad impl while Yonik hit critical blocks, and then as my rubber hit the road, I’d hammer him as needed and lots of back and forth would fill in the details. For a host of reasons, the impl would be a very rough and broad sketch of the actual whiteboard design. Some of these least dev time was spent on that leader sync process. Just as one example, the leader syncs to replicas and then asks replicas to sync to the leader. That second phase is, I believe, kind of silly messed up, and also unnecessary. Which is a common theme. I’m surprised to hear it can tale 6 minutes. Hard to remember where every random thing is in main. At the start, as kind of a prop, we would do some ridiculous waiting, being very conservative about preventing super easy large data loss with no code implemented to do anything sensible. These days, leader initiated recover is there to fill that gap. A you can say about everything, it has some issues, but fundamentally filling that gap is not one of them. Then peersync can be much faster, some details tweaked - Yonik code, so always ends up more adjusting the block and positioning it than it’s fundamental structure. Replication, plenty of ugly, slow, inefficient. RecoveryStrategy, a mess, mess of class, but you’d still recognize it in my code. Leader election, again, same fundamental design, recognizable, but fast, stable efficient. Plenty of that kind of silly messed up and unnecessary and you name it. So same intended design, separated by a whole hell of a lot of changes. If there was a yearly search engine derby that pitted such processes against each other, I’d March over with glee. Would probably be riddled with excitement at the prospect. So my feeling, pick which whatever design fixes or changes you think will produce a working system. Unless it’s unworkable craziness, the impl of everything will matter 50x and so just nail that and the rest will be fine. > > Before moving this to curator, we should likely simplify the approach or > it might not look good on curator. > When I did curator, I changed plenty. Still same fundamental design, but it was impossible not to look at the possibilities and it’s algorithms and kind of go to town. That was a bit of a luxury though. The mechanics of community, resources, collaboration, bike shed painting, existing framework forward momentum … anyone that navigates through with such ambitious plans at this point in time will have huge pile of my admiration. > I’m not that worried about Autoscaling (removed in main) or Overseer > (removed in main if you set the right config). > Oh I had no interest in autoscaling relative to many many many things. That’s really just a stand in for a variety of ambitious higher layers that AB has a talent for, and the system had a distaste for. It just pains my sensibilities. A business will have needs and customers and the things a business will have. And a developer will be assigned to go turn those needs into code - and it’s quite frustrating when those forces create situations where a good design, a solid honest effort, someone with a knack for such implementations - is not going to put out very good utility efficiency into the world given those systems often really, really need a solid foundation. Not that it’s some huge injustice, but I am very prejudiced against such waste. I like to see good work by good people harnessed into good things. This is why I ended up running from private development and into Lucene. > > Many other things to worry about though (for example cluster state cache > maintained async on all nodes at cost of heavy ZK usage on every change). > That was honestly one of the easier items, not that it took 5 minutes. I keep trying to get people to sit down with a pen and paper and sketch out what actually has to be communicated. How often. What data structures actually have to move. It’s about 100x less than what goes on in almost every dimension. Zk and that design are so damn fast scalable, oh man. Yo me it’s the same as the other stuff. Pick a design, they are all the same to me unless something is fundamentally ridiculous. As long as that design does no do 100x more than makes sense, and inefficiently even at that bar, it will be fine. IMO, the problem is, trying to come up with a design that fits the rest of the system and their expectations and connections and often o problems and or inefficient. I feel like, as often seems to be the case, designs are likely going to be guided by trying to come up with something that kind of attempts to mitigate, perhaps at grander and grander scales. But always with such potential to be compromised by the structure it wants to join and strengthen. There are a surprising number of behaviors and features and sql engines and … well, let’s just say, I think the best hope on such an endeavor would be to get wide permission for a axe and a lot of sad people with various attachments and dependencies on all the things that are disregarded. That’s why I just went through everything. Fix it all. Make it all work. Make it all efficient and fast. Leave no man but the ridiculous behind. Now that process is not easy. But it puts in a situation to really do some interesting things that are not compromised or heavily reduced and scaled down, or … anyway, it’s not practical information that if you make it all good you are in a position to do some great. I never saw any other path that wasn’t likely to be heavily compromised and unsatisfying or essentially a no holds barred reboot. I was never into a reboot without first getting to the bottom of the boot. I’ve seen that let’s just do version 2 game played before. Unfortunately, the world is setup where I can’t reasonably make the trade offs to even really do anything with the work I’ve done at a scale that would make sense. I think for similar reasons that large scale work on Solr proper have probably seen their most active days. So yeah, everyone has always brought up, we need some designs, we need to get everyone together and start planning it out. I say go to it. That type of collaboration has not gone on for a while, but I don’t think you will find anyone would object to it. Personally, I’d let others lead hashing out any designs. It’s easier to get more people, of all kinds, in on that. I think the implementation ends up being way more important and ends up with far fewer resources, I’d sign up for some contribution there. Impl while float any design but the silly or unworkable very nicely if given the fuel. Mark > Ilan > > On Thu 30 Sep 2021 at 01:02, Mark Miller <[email protected]> wrote: > >> You actually capture most of the history of cloud there AB. >> >> ZK is the heart of the system. It’s a rare chance you get the time or >> financing to lay that out on something that will be used. >> >> I didn’t get it done, changed jobs, and that mostly closed the window on >> that. >> >> Then you have a poor heart that would take a god amount of time and >> experience for anyone to really fully understand all the nuts and bolts of, >> even if you stood it up. And it’s about the equivalent of a poorly written >> concurrent program. >> >> So when you come along and try to put something like autoscaling on it, >> it’s going to subvert you the whole way. And unless you are going to change >> auto scaling to discover and rework all the problems in the heart of the >> system, not a lot you can do about it. And that completely ignores the >> overseer end of it. >> >> It’s a shame, I could setup a great heart to put something like auto >> scaling on for you now. But the ship has sailed. Very hard to claw that >> back and the world has adjusted to to getting what they can from what is. >> >> But yeah, curator is a huge improvement on a variety of those issues. And >> I invested enough into to know it’s good. It’s fast. It’s better and more >> apis and algorithms - documented. Maintained and pushed forward by a >> separate group dedicated to the task. >> >> But I can tell you, it’s by no means some kind of Rubik’s cube, but it is >> no small lift. >> >> Mark >> >> On Wed, Sep 29, 2021 at 9:13 AM Mark Miller <[email protected]> >> wrote: >> >>> I very much agree. That code is the root of a very surprising amount of >>> evil and has been for a surprisingly long time. >>> >>> There is a long list of reasons that I won’t iterate of why I don’t see >>> that as likely happening though - just starting with Ive brought it up to >>> various people over a couple years and gotten pushback just at the top. >>> Roughly, it’s on the scale of work and invasiveness, even with some >>> incremental paths, that I don’t see the path or resources to seriously >>> consider it myself. You can go back through jira history for quite a while >>> before you find that kind of item not looking out of place. >>> >>> Mark >>> >>> On Wed, Sep 29, 2021 at 2:05 AM Andrzej Białecki <[email protected]> wrote: >>> >>>> +1 to start working towards using Curator, this is long overdue and >>>> sooner or later we need to eat this frog - as you dig deeper and deeper it >>>> turns out that many issues in Solr can be attributed to our home-grown ZK >>>> code, there are maybe 2 people on the Solr team who understand what’s going >>>> on there (and I’m certainly not one of them!). And the maintenance cost is >>>> just too high over time. >>>> >>>> — >>>> >>>> Andrzej Białecki >>>> >>>> On 28 Sep 2021, at 21:31, Mark Miller <[email protected]> wrote: >>>> >>>> P.S. this is not actually the zookeeper design I would submit to any >>>> competition :) >>>> >>>> I’ve gone different routes in addressing the zookeeper short fall. This >>>> one is relatively easy, impactful and isolated for the right developer. >>>> >>>> Personally, with fewer scale and isolation limits, by the far the best >>>> thing I’ve done is remove almost all of our zk recipes and custom stuff and >>>> use Apache curator and replace our stuff as well as improve and expand on >>>> things using their large stable of well behaving recipes. I don’t think raw >>>> zookeeper is good for a project of more than a few people at most. But I >>>> wouldn’t toss that out there, it’s a much larger undertaking, no one is >>>> going to bite on that in passing. >>>> >>>> Mark >>>> -- >>>> - Mark >>>> >>>> http://about.me/markrmiller >>>> >>>> >>>> -- >>> - Mark >>> >>> http://about.me/markrmiller >>> >> -- >> - Mark >> >> http://about.me/markrmiller >> > -- - Mark http://about.me/markrmiller
