Andrew Kyle Purtell created HBASE-29975:
-------------------------------------------

             Summary: TLA+ specification of the AssignmentManager
                 Key: HBASE-29975
                 URL: https://issues.apache.org/jira/browse/HBASE-29975
             Project: HBase
          Issue Type: Task
          Components: master, proc-v2, Region Assignment
            Reporter: Andrew Kyle Purtell
            Assignee: Andrew Kyle Purtell


TLA+ is a formal specification language for designing and verifying concurrent 
and distributed systems. The name stands for the "Temporal Logic of Actions", a 
mathematical framework that combines first-order logic with temporal operators 
to reason about how system state evolves over time. When the model is a 
high-fidelity representation of the real system, proposed design and 
architectural changes can be checked against the full space of possible 
executions, surfacing logic bugs at design time, before any code is written. 
This can save weeks or months of development effort that would otherwise be 
spent discovering and debugging subtle concurrency issues in a running system.

A TLA+ specification describes a system as a state machine, an initial state 
predicate (`Init`), a next-state relation (`Next`) that defines every legal 
transition, and a collection of invariants, properties that must hold in every 
reachable state. The TLC model checker then systematically explores every 
possible execution of this state machine, checking each property at every 
state. If a property is violated, TLC produces a minimal counterexample trace 
showing the exact sequence of steps that led to the failure.

The HBase AssignmentManager is a core component of the HBase master that manages
the lifecycle of regions across a cluster of RegionServers. It coordinates
region assignment, unassignment, moves, and reopens; handles RegionServer
crashes through the ServerCrashProcedure (SCP); and recovers its own state
after a master crash through a durable procedure store. The correctness of
these interactions is critical. Bugs can cause data loss (double assignment /
split-brain writes), data unavailability (lost or stuck regions), or cluster
hangs (deadlocked procedures).

This TLA+ specification models the AssignmentManager as a state machine 
capturing:

- Region lifecycle: in-memory master state and persistent hbase:meta state, 
tracking regions through OFFLINE → OPENING  → OPEN → CLOSING → CLOSED (and the 
failure states FAILED_OPEN and ABNORMALLY_CLOSED).
- Asynchronous RPC channels: Master-to-RS commands and RS-to-master transition 
reports, modeled as sets of records with non-deterministic delivery.
- Procedure state: State records (type, step, target  server, retry count), 
with a durable procedure store that survives master crashes.
- Server liveness: Per-server online/crashed state, ZooKeeper ephemeral
  nodes, and WAL fencing state.
- Crash recovery: Multi-step ServerCrashProcedure (detect → assign meta →
  snapshot regions → fence WALs → reassign) and master crash/recovery (volatile
  state lost, durable state replayed).
- PEWorker thread pool: Available worker count, async suspension, and sync 
blocking when `hbase:meta` is unavailable during SCP meta-reassignment.

The specification defines safety invariants verified at every reachable state, 
including the NoDoubleAssignment (no region writable on two servers), 
MetaConsistency (persistent and in-memory state agree), FencingOrder (WALs 
fenced before reassignment), NoLostRegions (no region stuck without a procedure 
after crash recovery), and NoPEWorkerDeadlock (thread pool exhaustion 
detection). The liveness property (MetaEventuallyAssigned) verifies that 
`hbase:meta` is eventually reassigned after a crash. Action constraints enforce 
transition validity and SCP monotonicity.

The model is under active development and can found at:

https://github.com/apurtell/hbase/tree/WORK-architecture/src/main/spec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to