Hi all, We would like to propose merging the feature “Continuous Backup and Point-in-Time Recovery (PITR)” into the main branch. BackgroundExisting mechanisms such as replication and snapshots provide data redundancy but are insufficient for effective point-in-time recovery.
- *Replication* requires maintaining a live mirror cluster, which significantly increases operational costs. - *Snapshots* and *incremental snapshots* only capture data at discrete points in time, resulting in possible data loss between snapshots. Limitations of the Current Incremental Backup Solution The existing incremental backup framework in HBase exhibits several limitations: - *Risk of data loss:* Incremental backups are batch-based, leading to potential data loss between backup intervals. - *Limited restore flexibility:* Recovery is restricted to specific backup timestamps rather than any desired point in time. - *WAL management overhead:* Write-Ahead Logs (WALs) cannot be archived until the backup operation completes, increasing storage overhead and complexity. - *Complex tracking:* Manual tracking of backup IDs, job history, and logs introduces operational challenges. Summary of the Proposed Feature The *Continuous Backup and PITR* feature introduces a continuous and fine-grained backup mechanism that addresses the above limitations. It enables: - Continuous archival of WALs to support near real-time backup. - Restoration of data to any desired point in time (PITR) for improved data protection and flexibility. - Simplified backup lifecycle and WAL management. A detailed description of the design and implementation can be found in the following document: Design Document: Continuous Backup and Point-in-Time Recovery <https://docs.google.com/document/d/1csQBMyM1mwpe4QpWkCbyqvsC9F5nUBr4ierOo8IuGpE/edit?pli=1&tab=t.0> Please review and share your feedback or comments. Best regards, Vinayak Hegde
