[ 
https://issues.apache.org/jira/browse/HBASE-29975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-29975:
----------------------------------------
    Description: 
TLA+ is a formal specification language for designing and verifying concurrent 
and distributed systems. The name stands for the "Temporal Logic of Actions", a 
mathematical framework that combines first-order logic with temporal operators 
to reason about how system state evolves over time. When the model is a 
high-fidelity representation of the real system, proposed design and 
architectural changes can be checked against the full space of possible 
executions, surfacing logic bugs at design time, before any code is written. 
This can save weeks or months of development effort that would otherwise be 
spent discovering and debugging subtle concurrency issues in a running system.

A TLA+ specification describes a system as a state machine, an initial state 
predicate (`Init`), a next-state relation (`Next`) that defines every legal 
transition, and a collection of invariants, properties that must hold in every 
reachable state. The TLC model checker then systematically explores every 
possible execution of this state machine, checking each property at every 
state. If a property is violated, TLC produces a minimal counterexample trace 
showing the exact sequence of steps that led to the failure.

The HBase AssignmentManager is a core component of the HBase master that 
manages the lifecycle of regions across a cluster of RegionServers. It 
coordinates region assignment, unassignment, moves, and reopens, handles 
RegionServer crashes through the ServerCrashProcedure (SCP), and recovers its 
own state after a master crash through a durable procedure store. The 
correctness of these interactions is critical.

This TLA+ specification models the AssignmentManager as a state machine 
capturing:

- Region lifecycle: in-memory master state and persistent hbase:meta state, 
tracking regions through OFFLINE → OPENING  → OPEN → CLOSING → CLOSED (and the 
failure states FAILED_OPEN and ABNORMALLY_CLOSED).
- Asynchronous RPC channels: Master-to-RS commands and RS-to-master transition 
reports, modeled as sets of records with non-deterministic delivery.
- Procedure state: State records (type, step, target  server, retry count), 
with a durable procedure store that survives master crashes.
- Server liveness: Per-server online/crashed state, ZooKeeper ephemeral
  nodes, and WAL fencing state.
- Crash recovery: Multi-step ServerCrashProcedure (detect → assign meta →
  snapshot regions → fence WALs → reassign) and master crash/recovery (volatile
  state lost, durable state replayed).
- PEWorker thread pool: Available worker count, async suspension, and sync 
blocking when `hbase:meta` is unavailable during SCP meta-reassignment.

The specification defines safety invariants verified at every reachable state, 
including the NoDoubleAssignment (no region writable on two servers), 
MetaConsistency (persistent and in-memory state agree), FencingOrder (WALs 
fenced before reassignment), NoLostRegions (no region stuck without a procedure 
after crash recovery), and NoPEWorkerDeadlock (thread pool exhaustion 
detection). The liveness property (MetaEventuallyAssigned) verifies that 
`hbase:meta` is eventually reassigned after a crash. Action constraints enforce 
transition validity and SCP monotonicity.

The model is under active development and can found at:

https://github.com/apurtell/hbase/tree/WORK-architecture/src/main/spec

  was:
TLA+ is a formal specification language for designing and verifying concurrent 
and distributed systems. The name stands for the "Temporal Logic of Actions", a 
mathematical framework that combines first-order logic with temporal operators 
to reason about how system state evolves over time. When the model is a 
high-fidelity representation of the real system, proposed design and 
architectural changes can be checked against the full space of possible 
executions, surfacing logic bugs at design time, before any code is written. 
This can save weeks or months of development effort that would otherwise be 
spent discovering and debugging subtle concurrency issues in a running system.

A TLA+ specification describes a system as a state machine, an initial state 
predicate (`Init`), a next-state relation (`Next`) that defines every legal 
transition, and a collection of invariants, properties that must hold in every 
reachable state. The TLC model checker then systematically explores every 
possible execution of this state machine, checking each property at every 
state. If a property is violated, TLC produces a minimal counterexample trace 
showing the exact sequence of steps that led to the failure.

The HBase AssignmentManager is a core component of the HBase master that manages
the lifecycle of regions across a cluster of RegionServers. It coordinates
region assignment, unassignment, moves, and reopens; handles RegionServer
crashes through the ServerCrashProcedure (SCP); and recovers its own state
after a master crash through a durable procedure store. The correctness of
these interactions is critical. Bugs can cause data loss (double assignment /
split-brain writes), data unavailability (lost or stuck regions), or cluster
hangs (deadlocked procedures).

This TLA+ specification models the AssignmentManager as a state machine 
capturing:

- Region lifecycle: in-memory master state and persistent hbase:meta state, 
tracking regions through OFFLINE → OPENING  → OPEN → CLOSING → CLOSED (and the 
failure states FAILED_OPEN and ABNORMALLY_CLOSED).
- Asynchronous RPC channels: Master-to-RS commands and RS-to-master transition 
reports, modeled as sets of records with non-deterministic delivery.
- Procedure state: State records (type, step, target  server, retry count), 
with a durable procedure store that survives master crashes.
- Server liveness: Per-server online/crashed state, ZooKeeper ephemeral
  nodes, and WAL fencing state.
- Crash recovery: Multi-step ServerCrashProcedure (detect → assign meta →
  snapshot regions → fence WALs → reassign) and master crash/recovery (volatile
  state lost, durable state replayed).
- PEWorker thread pool: Available worker count, async suspension, and sync 
blocking when `hbase:meta` is unavailable during SCP meta-reassignment.

The specification defines safety invariants verified at every reachable state, 
including the NoDoubleAssignment (no region writable on two servers), 
MetaConsistency (persistent and in-memory state agree), FencingOrder (WALs 
fenced before reassignment), NoLostRegions (no region stuck without a procedure 
after crash recovery), and NoPEWorkerDeadlock (thread pool exhaustion 
detection). The liveness property (MetaEventuallyAssigned) verifies that 
`hbase:meta` is eventually reassigned after a crash. Action constraints enforce 
transition validity and SCP monotonicity.

The model is under active development and can found at:

https://github.com/apurtell/hbase/tree/WORK-architecture/src/main/spec


> TLA+ specification of the AssignmentManager
> -------------------------------------------
>
>                 Key: HBASE-29975
>                 URL: https://issues.apache.org/jira/browse/HBASE-29975
>             Project: HBase
>          Issue Type: Task
>          Components: master, proc-v2, Region Assignment
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>
> TLA+ is a formal specification language for designing and verifying 
> concurrent and distributed systems. The name stands for the "Temporal Logic 
> of Actions", a mathematical framework that combines first-order logic with 
> temporal operators to reason about how system state evolves over time. When 
> the model is a high-fidelity representation of the real system, proposed 
> design and architectural changes can be checked against the full space of 
> possible executions, surfacing logic bugs at design time, before any code is 
> written. This can save weeks or months of development effort that would 
> otherwise be spent discovering and debugging subtle concurrency issues in a 
> running system.
> A TLA+ specification describes a system as a state machine, an initial state 
> predicate (`Init`), a next-state relation (`Next`) that defines every legal 
> transition, and a collection of invariants, properties that must hold in 
> every reachable state. The TLC model checker then systematically explores 
> every possible execution of this state machine, checking each property at 
> every state. If a property is violated, TLC produces a minimal counterexample 
> trace showing the exact sequence of steps that led to the failure.
> The HBase AssignmentManager is a core component of the HBase master that 
> manages the lifecycle of regions across a cluster of RegionServers. It 
> coordinates region assignment, unassignment, moves, and reopens, handles 
> RegionServer crashes through the ServerCrashProcedure (SCP), and recovers its 
> own state after a master crash through a durable procedure store. The 
> correctness of these interactions is critical.
> This TLA+ specification models the AssignmentManager as a state machine 
> capturing:
> - Region lifecycle: in-memory master state and persistent hbase:meta state, 
> tracking regions through OFFLINE → OPENING  → OPEN → CLOSING → CLOSED (and 
> the failure states FAILED_OPEN and ABNORMALLY_CLOSED).
> - Asynchronous RPC channels: Master-to-RS commands and RS-to-master 
> transition reports, modeled as sets of records with non-deterministic 
> delivery.
> - Procedure state: State records (type, step, target  server, retry count), 
> with a durable procedure store that survives master crashes.
> - Server liveness: Per-server online/crashed state, ZooKeeper ephemeral
>   nodes, and WAL fencing state.
> - Crash recovery: Multi-step ServerCrashProcedure (detect → assign meta →
>   snapshot regions → fence WALs → reassign) and master crash/recovery 
> (volatile
>   state lost, durable state replayed).
> - PEWorker thread pool: Available worker count, async suspension, and sync 
> blocking when `hbase:meta` is unavailable during SCP meta-reassignment.
> The specification defines safety invariants verified at every reachable 
> state, including the NoDoubleAssignment (no region writable on two servers), 
> MetaConsistency (persistent and in-memory state agree), FencingOrder (WALs 
> fenced before reassignment), NoLostRegions (no region stuck without a 
> procedure after crash recovery), and NoPEWorkerDeadlock (thread pool 
> exhaustion detection). The liveness property (MetaEventuallyAssigned) 
> verifies that `hbase:meta` is eventually reassigned after a crash. Action 
> constraints enforce transition validity and SCP monotonicity.
> The model is under active development and can found at:
> https://github.com/apurtell/hbase/tree/WORK-architecture/src/main/spec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to