Re: ZkCmdExecutor

Mark Miller Fri, 01 Oct 2021 01:39:37 -0700

On Thu, Sep 30, 2021 at 3:31 AM Ilan Ginzburg <[email protected]> wrote:

> Independent of how interactions with ZK are implemented (direct or via
> Curator), we should first clean up what these interactions do or expect.
>
> Take shard leader elector. First a replica is elected, then we check if it
> is fit for the job, run another election if not, look at other replicas
> (hopefully) participating in the election, wait a bit more (total wait can
> be 6 minutes), then might decide that an unfit leader is still fit…
>

Personally, I looked at that and saw a whole different set of problems that
had to be solved. No one around on the same page as me there though, and
with everyone else interested in sitting down and coming up with new
designs, I tend to cut out fast when I don’t feel something is going
somewhere. That was a different day though, different people different
agenda.

I will say, I have code (nothing even close to a patch or relevant
practically here, but I have code that follows the white board of that
design that is sub second if there is no data to be synced, and very damn
fast even if there is.

The kind of talk that tends to be taken as I’m promoting or defending some
design, but I’m pretty design agnostic, unless it somehow makes things
impossible.

When I say the same design, that doesn’t mean it does all the same steps.
Just that it follows the same design that Yonik drove the white board of, I
drove the broad impl while Yonik hit critical blocks, and then as my rubber
hit the road, I’d hammer him as needed and lots of back and forth would
fill in the details.  For a host of reasons, the impl would be a very rough
and broad sketch of the actual whiteboard design.

Some of these least dev time was spent on that leader sync process. Just as
one example, the leader syncs to replicas and then asks replicas to sync to
the leader. That second phase is, I believe, kind of silly messed up, and
also unnecessary. Which is a common theme.

I’m surprised to hear it can tale 6 minutes. Hard to remember where every
random thing is in main. At the start, as kind of a prop, we would do some
ridiculous waiting, being very conservative about preventing super easy
large data loss with no code implemented to do anything sensible.

These days, leader initiated recover is there to fill that gap. A you can
say about everything, it has some issues, but fundamentally filling that
gap is not one of them.

Then peersync can be much faster, some details tweaked - Yonik code, so
always ends up more adjusting the block and positioning it than it’s
fundamental structure. Replication, plenty of ugly, slow, inefficient.
RecoveryStrategy, a mess, mess of class, but you’d still recognize it in my
code. Leader election, again, same fundamental design, recognizable, but
fast, stable efficient.  Plenty of that kind of silly messed up and
unnecessary and you name it.

So same intended design, separated by a whole hell of a lot of changes. If
there was a yearly search engine derby that pitted such processes against
each other, I’d March over with glee. Would probably be riddled with
excitement at the prospect.

So my feeling, pick which whatever design fixes or changes you think will
produce a working system. Unless it’s unworkable craziness, the impl of
everything will matter 50x and so just nail that and the rest will be fine.

>
> Before moving this to curator, we should likely simplify the approach or
> it might not look good on curator.
>

When I did curator, I changed plenty. Still same fundamental design, but it
was impossible not to look at the possibilities and it’s algorithms and
kind of go to town.

That was a bit of a luxury though. The mechanics of community, resources,
collaboration, bike shed painting, existing framework forward momentum …
anyone that navigates through with such ambitious plans at this point in
time will have huge pile of my admiration.

> I’m not that worried about Autoscaling (removed in main) or Overseer
> (removed in main if you set the right config).
>

Oh I had no interest in autoscaling relative to many many many things.
That’s really just a stand in for a variety of ambitious higher layers that
AB has a talent for, and the system had a distaste for. It just pains my
sensibilities. A business will have needs and customers and the things a
business will have. And a developer will be assigned to go turn those needs
into code - and it’s quite frustrating when those forces create situations
where a good design, a solid honest effort, someone with a knack for such
implementations - is not going to put out very good utility efficiency into
the world given those systems often really, really need a solid
foundation.  Not that it’s some huge injustice, but I am very prejudiced
against such waste. I like to see good work by good people harnessed into
good things. This is why I ended up running from private development and
into Lucene.

>
> Many other things to worry about though (for example cluster state cache
> maintained async on all nodes at cost of heavy ZK usage on every change).
>

That was honestly one of the easier items, not that it took 5 minutes. I
keep trying to get people to sit down with a pen and paper and sketch out
what actually has to be communicated. How often. What data structures
actually have to move. It’s about 100x less than what goes on in almost
every dimension. Zk and that design are so damn fast scalable, oh man. Yo
me it’s the same as the other stuff. Pick a design, they are all the same
to me unless something is fundamentally ridiculous. As long as that design
does no do 100x more than makes sense, and inefficiently even at that bar,
it will be fine.

IMO, the problem is, trying to come up with a design that fits the rest of
the system and their expectations and connections and often o problems and
or inefficient. I feel like, as often seems to be the case, designs are
likely going to be guided by trying to come up with something that kind of
attempts to mitigate, perhaps at grander and grander scales. But always
with such potential to be compromised by the structure it wants to join and
strengthen.

There are a surprising number of behaviors and features and sql engines and
… well, let’s just say, I think the best hope on such an endeavor would be
to get wide permission for a axe and a lot of sad people with various
attachments and dependencies on all the things that are disregarded.

That’s why I just went through everything. Fix it all. Make it all work.
Make it all efficient and fast. Leave no man but the ridiculous behind. Now
that process is not easy. But it puts in a situation to really do some
interesting things that are not compromised or heavily reduced and scaled
down, or … anyway, it’s not practical information that if you make it all
good you are in a position to do some great. I never saw any other path
that wasn’t likely to be heavily compromised and unsatisfying or
essentially a no holds barred reboot. I was never into a reboot without
first getting to the bottom of the boot. I’ve seen that let’s just do
version 2 game played before.

Unfortunately, the world is setup where I can’t reasonably make the trade
offs to even really do anything with the work I’ve done at a scale that
would make sense. I think for similar reasons that large scale work on Solr
proper have probably seen their most active days.

So yeah, everyone has always brought up, we need some designs, we need to
get everyone together and start planning it out. I say go to it. That type
of collaboration has not gone on for a while, but I don’t think you will
find anyone would object to it.

Personally, I’d let others lead hashing out any designs. It’s easier to get
more people, of all kinds,  in on that.

I think the implementation ends up being way more important and ends up
with far fewer resources, I’d sign up for some contribution there. Impl
while float any design but the silly or unworkable very nicely if given the
fuel.

Mark

> Ilan
>
> On Thu 30 Sep 2021 at 01:02, Mark Miller <[email protected]> wrote:
>
>> You actually capture most of the history of cloud there AB.
>>
>> ZK is the heart of the system. It’s a rare chance you get the time or
>> financing to lay that out on something that will be used.
>>
>> I didn’t get it done, changed jobs, and that mostly closed the window on
>> that.
>>
>> Then you have a poor heart that would take a god amount of time and
>> experience for anyone to really fully understand all the nuts and bolts of,
>> even if you stood it up.  And it’s about the equivalent of a poorly written
>> concurrent program.
>>
>> So when you come along and try to put something like autoscaling on it,
>> it’s going to subvert you the whole way. And unless you are going to change
>> auto scaling to discover and rework all the problems in the heart of the
>> system, not a lot you can do about it. And that completely ignores the
>> overseer end of it.
>>
>> It’s a shame, I could setup a great heart to put something like auto
>> scaling on for you now. But the ship has sailed. Very hard to claw that
>> back and the world has adjusted to to getting what they can from what is.
>>
>> But yeah, curator is a huge improvement on a variety of those issues. And
>> I invested enough into to know it’s good. It’s fast. It’s better and more
>> apis and algorithms - documented. Maintained and pushed forward by a
>> separate group dedicated to the task.
>>
>> But I can tell you, it’s by no means some kind of Rubik’s cube, but it is
>> no small lift.
>>
>> Mark
>>
>> On Wed, Sep 29, 2021 at 9:13 AM Mark Miller <[email protected]>
>> wrote:
>>
>>> I very much agree. That code is the root of a very surprising amount of
>>> evil and has been for a surprisingly long time.
>>>
>>> There is a long list of reasons that I won’t iterate of why I don’t see
>>> that as likely happening though - just starting with Ive brought it up to
>>> various people over a couple years and gotten pushback just at the top.
>>> Roughly, it’s on the scale of work and invasiveness, even with some
>>> incremental paths, that I don’t see the path or resources to seriously
>>> consider it myself. You can go back through jira history for quite a while
>>> before you find that kind of item not looking out of place.
>>>
>>> Mark
>>>
>>> On Wed, Sep 29, 2021 at 2:05 AM Andrzej Białecki <[email protected]> wrote:
>>>
>>>> +1 to start working towards using Curator, this is long overdue and
>>>> sooner or later we need to eat this frog - as you dig deeper and deeper it
>>>> turns out that many issues in Solr can be attributed to our home-grown ZK
>>>> code, there are maybe 2 people on the Solr team who understand what’s going
>>>> on there (and I’m certainly not one of them!). And the maintenance cost is
>>>> just too high over time.
>>>>
>>>> —
>>>>
>>>> Andrzej Białecki
>>>>
>>>> On 28 Sep 2021, at 21:31, Mark Miller <[email protected]> wrote:
>>>>
>>>> P.S. this is not actually the zookeeper design I would submit to any
>>>> competition :)
>>>>
>>>> I’ve gone different routes in addressing the zookeeper short fall. This
>>>> one is relatively easy, impactful and isolated for the right developer.
>>>>
>>>> Personally, with fewer scale and isolation limits, by the far the best
>>>> thing I’ve done is remove almost all of our zk recipes and custom stuff and
>>>> use Apache curator and replace our stuff as well as improve and expand on
>>>> things using their large stable of well behaving recipes. I don’t think raw
>>>> zookeeper is good for a project of more than a few people at most. But I
>>>> wouldn’t toss that out there, it’s a much larger undertaking, no one is
>>>> going to bite on that in passing.
>>>>
>>>> Mark
>>>> --
>>>> - Mark
>>>>
>>>> http://about.me/markrmiller
>>>>
>>>>
>>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
> --
- Mark

http://about.me/markrmiller

Re: ZkCmdExecutor

Reply via email to