I go into a bunch of this silliness in the presentation I put together on the Overseer. I’ll look into sharing it.
ZkStateReader is obviously not in the Overseer class, but it’s all one communication system; it’s de facto Overseer code and only slightly better than the Overseer class. A state update, at its core, is 2 bits of state information, and being efficient, but not super efficient, a 32-bit replica ID. So let’s call it 34 bits to be shared around. To do that, we will send far, far more than 32bits through a mostly serial system of plentitudes of object creation, JSON construction, JSON parsing, and data shuffling to the beat of a cluster full of different impatient drummers, even though the system was designed for patience. The immutable ClusterState object in ZkStateReader is one of the obviously very silly components. Although all collection state is totally independent, you don’t even need the same Overseer to manage them all for any reason, the entire communication process works serially through collections. In this case, that’s handled by the cluster state lock. So if our 34bits (in our more sane world imagination) of info comes down for a few collections, one at a time, after parsing a bunch of extraneous JSON, we will rebuild the entire cluster state of objects, but with 2 bits of difference, and then we will do that again, serially, for each collection we have a state update for. This type of thing happens all along the path that a 2-bit state update travels, multiplied by X number of possible extraneous spam repeats or long-ago superseded updates. So yeah, in the SolrCloud prototype standup, the cluster state was made immutable essentially as a simple way to shield developers from concurrency issues when working with it. The cost analysis before anything was built was something like, it’s really not so bad. You add up the number of those costs that landed and stuck, and it really is so bad. And if you really wanted to keep an immutable cluster state object, that still doesn’t mean ZkStateReader has to use one for its cluster state structure just because that’s what it gives out with getClusterState. The separation of replica states from cluster structure is useful in addressing an efficient cluster state structure and update strategy in ZkStateReader, if not just so that you know what you actually *need* to update. If you get a collection worth of JSON, you have to update it all or do some silly gymnastics to reverse engineer what the update actually is. For me, a concurrent hashmap in ZkStateReader was better than a cluster state object. It mapped a collection to some kind of collection state, and in the replica state, I put an atomic integer indicating the state. Then, you can throw out the global lock and forget about any lock or object creation when updating replica states. The entire communication path can be made easily 100x+ more scalable with changes that are just as simple and straightforward. “Oh, this is crazy. And it’s easy to do something that’s not.” And without any brain sweat, you end up with a system that works in parallel on independent work, transmits state sizes close to the actual state that needs transmitting, doesn’t spam updates that are either unnecessary or already outdated, and operates at a designed developer drum beat rather than an arbitrary army of drummers.