Metadata layer in cluster.

via GitHub Thu, 13 Nov 2025 10:57:48 -0800


GitHub user numinnex created a discussion: Metadata layer in cluster.


Related issue: #2339

As part of the clustering functionality, we need to implement SMR (State 
Machine Replication) for our metadata (streams, topics, consumer groups, users, 
etc.).
Currently, I am thinking about a few different options for how to model parts 
of the abstraction that we need.
1) State machine (or state machines)
There are a few options. One is to have a single state machine that covers all 
metadata functionality, or to have multiple state machines (one each for 
streams, topics, consumer groups, etc.). Each approach has its pros and cons.
Single state machine:
Pros:

Simpler mental model and implementation - no need to reason about multiple 
state machines, side effects, and potential interactions between them.
One atomic snapshot.

Cons:

Worse scalability (both performance and extensibility).
Since there is one global snapshot, fast-forwarding new/lagging replicas goes 
through a single chokepoint.

Mux state machine:
Pros:

Inversion of all the cons of the single state machine approach. Additionally, 
with multiple state machines we can defer deserialization to each state 
machine, avoiding eagerly deserializing the command payload. We can also 
version snapshots for each state machine independently.

Cons:

More complex mental model and implementation - need to reason about 
interactions and side effects between state machines.
Snapshot coordination becomes more complex.

2) JournalLog (where we will store uncommitted and committed ops)
There are a few options as well, but I think it's much clearer what to do here 
than with the STM. I think an on-disk circular buffer (considering that we can 
calculate an upper bound size for each metadata command) that is snapshotable 
is more than enough for our use-case.

3) Snapshot
Given the design choice of JournalLog, I think we need a full count-based 
snapshotting mechanism that will use sequence as its revision number. The 
snapshot needs to be stored in 3 copies (for redundancy) and needs to be 
composable from smaller snapshots (individual STMs in case of MuxStateMachine).
Maybe it's actually a good idea to take incremental snapshots of each STM and 
create a global snapshot that would be a full snapshot of the metadata? 
Something worth exploring if we choose the MuxStateMachine variant.


Those are the main points that I would like to hear your input on, if you have 
anything that you think is worth considering as well, feel free to include it 
in your comments.

GitHub link: https://github.com/apache/iggy/discussions/2346

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Metadata layer in cluster.

Reply via email to