Andrew Kyle Purtell created HBASE-29975:
-------------------------------------------
Summary: TLA+ specification of the AssignmentManager
Key: HBASE-29975
URL: https://issues.apache.org/jira/browse/HBASE-29975
Project: HBase
Issue Type: Task
Components: master, proc-v2, Region Assignment
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
TLA+ is a formal specification language for designing and verifying concurrent
and distributed systems. The name stands for the "Temporal Logic of Actions", a
mathematical framework that combines first-order logic with temporal operators
to reason about how system state evolves over time. When the model is a
high-fidelity representation of the real system, proposed design and
architectural changes can be checked against the full space of possible
executions, surfacing logic bugs at design time, before any code is written.
This can save weeks or months of development effort that would otherwise be
spent discovering and debugging subtle concurrency issues in a running system.
A TLA+ specification describes a system as a state machine, an initial state
predicate (`Init`), a next-state relation (`Next`) that defines every legal
transition, and a collection of invariants, properties that must hold in every
reachable state. The TLC model checker then systematically explores every
possible execution of this state machine, checking each property at every
state. If a property is violated, TLC produces a minimal counterexample trace
showing the exact sequence of steps that led to the failure.
The HBase AssignmentManager is a core component of the HBase master that manages
the lifecycle of regions across a cluster of RegionServers. It coordinates
region assignment, unassignment, moves, and reopens; handles RegionServer
crashes through the ServerCrashProcedure (SCP); and recovers its own state
after a master crash through a durable procedure store. The correctness of
these interactions is critical. Bugs can cause data loss (double assignment /
split-brain writes), data unavailability (lost or stuck regions), or cluster
hangs (deadlocked procedures).
This TLA+ specification models the AssignmentManager as a state machine
capturing:
- Region lifecycle: in-memory master state and persistent hbase:meta state,
tracking regions through OFFLINE → OPENING → OPEN → CLOSING → CLOSED (and the
failure states FAILED_OPEN and ABNORMALLY_CLOSED).
- Asynchronous RPC channels: Master-to-RS commands and RS-to-master transition
reports, modeled as sets of records with non-deterministic delivery.
- Procedure state: State records (type, step, target server, retry count),
with a durable procedure store that survives master crashes.
- Server liveness: Per-server online/crashed state, ZooKeeper ephemeral
nodes, and WAL fencing state.
- Crash recovery: Multi-step ServerCrashProcedure (detect → assign meta →
snapshot regions → fence WALs → reassign) and master crash/recovery (volatile
state lost, durable state replayed).
- PEWorker thread pool: Available worker count, async suspension, and sync
blocking when `hbase:meta` is unavailable during SCP meta-reassignment.
The specification defines safety invariants verified at every reachable state,
including the NoDoubleAssignment (no region writable on two servers),
MetaConsistency (persistent and in-memory state agree), FencingOrder (WALs
fenced before reassignment), NoLostRegions (no region stuck without a procedure
after crash recovery), and NoPEWorkerDeadlock (thread pool exhaustion
detection). The liveness property (MetaEventuallyAssigned) verifies that
`hbase:meta` is eventually reassigned after a crash. Action constraints enforce
transition validity and SCP monotonicity.
The model is under active development and can found at:
https://github.com/apurtell/hbase/tree/WORK-architecture/src/main/spec
--
This message was sent by Atlassian Jira
(v8.20.10#820010)