GitHub user numinnex created a discussion: Metadata layer in cluster.
Related issue: #2339 As part of the clustering functionality, we need to implement SMR (State Machine Replication) for our metadata (streams, topics, consumer groups, users, etc.). Currently, I am thinking about a few different options for how to model parts of the abstraction that we need. 1) State machine (or state machines) There are a few options. One is to have a single state machine that covers all metadata functionality, or to have multiple state machines (one each for streams, topics, consumer groups, etc.). Each approach has its pros and cons. Single state machine: Pros: Simpler mental model and implementation - no need to reason about multiple state machines, side effects, and potential interactions between them. One atomic snapshot. Cons: Worse scalability (both performance and extensibility). Since there is one global snapshot, fast-forwarding new/lagging replicas goes through a single chokepoint. Mux state machine: Pros: Inversion of all the cons of the single state machine approach. Additionally, with multiple state machines we can defer deserialization to each state machine, avoiding eagerly deserializing the command payload. We can also version snapshots for each state machine independently. Cons: More complex mental model and implementation - need to reason about interactions and side effects between state machines. Snapshot coordination becomes more complex. 2) JournalLog (where we will store uncommitted and committed ops) There are a few options as well, but I think it's much clearer what to do here than with the STM. I think an on-disk circular buffer (considering that we can calculate an upper bound size for each metadata command) that is snapshotable is more than enough for our use-case. 3) Snapshot Given the design choice of JournalLog, I think we need a full count-based snapshotting mechanism that will use sequence as its revision number. The snapshot needs to be stored in 3 copies (for redundancy) and needs to be composable from smaller snapshots (individual STMs in case of MuxStateMachine). Maybe it's actually a good idea to take incremental snapshots of each STM and create a global snapshot that would be a full snapshot of the metadata? Something worth exploring if we choose the MuxStateMachine variant. Those are the main points that I would like to hear your input on, if you have anything that you think is worth considering as well, feel free to include it in your comments. GitHub link: https://github.com/apache/iggy/discussions/2346 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
