lhotari commented on code in PR #1138: URL: https://github.com/apache/pulsar-site/pull/1138#discussion_r3180358414
########## docs/administration-autorecovery.md: ########## @@ -0,0 +1,143 @@ +--- +id: administration-autorecovery +title: BookKeeper AutoRecovery +sidebar_label: "BookKeeper AutoRecovery" +description: Learn how BookKeeper AutoRecovery works in Pulsar, and how to configure, enable, disable, and monitor it. +--- + +When a bookie in a Pulsar cluster becomes unavailable, the ledgers it stored become under-replicated, meaning the data no longer meets the configured replication factor. BookKeeper provides **AutoRecovery** to detect this situation automatically and rereplicate affected ledgers to healthy bookies — without manual intervention. + +## How AutoRecovery works + +AutoRecovery runs as two concurrent components within the `autorecovery` process: + +### Auditor + +The **Auditor** is a singleton node, elected via ZooKeeper leader election from all nodes participating in AutoRecovery. When a bookie fails or is reported unavailable, the Auditor: + +1. Scans all ledgers to identify those that stored data on the failed bookie. +2. Publishes rereplication tasks to the `/underreplicated/` ZooKeeper znode. + +Only one Auditor is active in a cluster at any time. If the current Auditor node fails, a new leader election takes place automatically. + +### Replication Worker + +A **Replication Worker** runs on every node participating in AutoRecovery. Each worker: + +1. Monitors the `/underreplicated/` ZooKeeper znode for tasks published by the Auditor. +2. Acquires a ZooKeeper ephemeral lock on an available task (ensuring no two workers replicate the same fragment simultaneously). +3. Copies the under-replicated ledger fragments to its local bookie to restore the configured replication factor. + +## Deploy AutoRecovery + +AutoRecovery can be deployed in two ways: + +- **Alongside bookies** — each bookie also runs an AutoRecovery thread (default behavior when `autoRecoveryDaemonEnabled=true`). +- **On dedicated nodes** — separate machines run only AutoRecovery. This is recommended for large clusters where you want to isolate recovery I/O from normal bookie traffic. Set `autoRecoveryDaemonEnabled=false` on the bookies to disable the embedded AutoRecovery thread. + +## Start AutoRecovery + +To start AutoRecovery as a standalone process in the foreground: + +```shell +bin/bookkeeper autorecovery +``` + +To start AutoRecovery as a background daemon: + +```shell +bin/pulsar-daemon start autorecovery +``` + +Ensure [`zkServers`](reference-configuration.md#bookkeeper-zkServers) in `conf/bookkeeper.conf` points to your ZooKeeper cluster before starting. Review Comment: `zkServers` is a deprecated setting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
