Re: ZkStateReader.getUpdateLock / ClusterState immutability

Mark Miller Tue, 16 Jul 2024 23:31:24 -0700

I go into a bunch of this silliness in the presentation I put together on
the Overseer. I’ll look into sharing it.


ZkStateReader is obviously not in the Overseer class, but it’s all one
communication system; it’s de facto Overseer code and only slightly better
than the Overseer class.

A state update, at its core, is 2 bits of state information, and being
efficient, but not super efficient, a 32-bit replica ID. So let’s call it
34 bits to be shared around. To do that, we will send far, far more than
32bits through a mostly serial system of plentitudes of object creation,
JSON construction, JSON parsing, and data shuffling to the beat of a
cluster full of different impatient drummers, even though the system was
designed for patience.

The immutable ClusterState object in ZkStateReader is one of the obviously
very silly components. Although all collection state is totally
independent, you don’t even need the same Overseer to manage them all for
any reason, the entire communication process works serially through
collections.

In this case, that’s handled by the cluster state lock. So if our 34bits
(in our more sane world imagination) of info comes down for a few
collections, one at a time, after parsing a bunch of extraneous JSON, we
will rebuild the entire cluster state of objects, but with 2 bits of
difference, and then we will do that again, serially, for each collection
we have a state update for. This type of thing happens all along the path
that a 2-bit state update travels, multiplied by X number of possible
extraneous spam repeats or long-ago superseded updates.

So yeah, in the SolrCloud prototype standup, the cluster state was made
immutable essentially as a simple way to shield developers from concurrency
issues when working with it. The cost analysis before anything was built
was something like, it’s really not so bad. You add up the number of those
costs that landed and stuck, and it really is so bad. And if you really
wanted to keep an immutable cluster state object, that still doesn’t mean
ZkStateReader has to use one for its cluster state structure just because
that’s what it gives out with getClusterState.

The separation of replica states from cluster structure is useful in
addressing an efficient cluster state structure and update strategy in
ZkStateReader, if not just so that you know what you actually *need* to
update. If you get a collection worth of JSON, you have to update it all or
do some silly gymnastics to reverse engineer what the update actually is.

For me, a concurrent hashmap in ZkStateReader was better than a cluster
state object. It mapped a collection to some kind of collection state, and
in the replica state, I put an atomic integer indicating the state. Then,
you can throw out the global lock and forget about any lock or object
creation when updating replica states.

The entire communication path can be made easily 100x+ more scalable with
changes that are just as simple and straightforward. “Oh, this is crazy.
And it’s easy to do something that’s not.” And without any brain sweat, you
end up with a system that works in parallel on independent work, transmits
state sizes close to the actual state that needs transmitting, doesn’t spam
updates that are either unnecessary or already outdated, and operates at a
designed developer drum beat rather than an arbitrary army of drummers.

Re: ZkStateReader.getUpdateLock / ClusterState immutability

Reply via email to