[
https://issues.apache.org/jira/browse/HBASE-28957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HBASE-28957:
-----------------------------------
Labels: pull-request-available (was: )
> Adding support for continuous Backup and Point-in-Time Recovery
> ---------------------------------------------------------------
>
> Key: HBASE-28957
> URL: https://issues.apache.org/jira/browse/HBASE-28957
> Project: HBase
> Issue Type: Umbrella
> Components: backup&restore
> Affects Versions: 2.6.0, 3.0.0-alpha-4
> Reporter: Ankit Singhal
> Priority: Major
> Labels: pull-request-available
>
> Current solutions like replication and snapshots offer data redundancy but
> have limitations that prevent effective point-in-time recovery in cases of
> data corruption or accidental changes. Replication requires maintaining a
> live cluster that mirrors the original, which incurs substantial costs to
> keep both clusters operational. Snapshots, on the other hand, do not support
> point-in-time recovery, leading to potential data loss between snapshots.
> Incremental snapshots improve this situation but still do not provide full
> protection, as they only capture data at specific intervals.
> Limitations of the Current Incremental Backup Solution
> The current incremental backup solution in HBase has several critical
> limitations that highlight the need for continuous backup and PITR:
> • Risk of Data Loss: Since incremental backups are created in
> batches rather than continuously, any changes made since the last backup are
> at risk of being lost if data corruption or deletion occurs before the next
> scheduled backup.
> • Restore Point Limitations: Users can only restore data to
> specific backup timestamps rather than any exact moment in time, restricting
> flexibility and the ability to revert to the most recent stable state before
> an issue.
> • WAL Management Challenges: Write-Ahead Logs on the source
> cluster cannot be archived until the backup process completes, making WAL
> management complex and storage-intensive on the source cluster.
> • Complex Backup Tracking: Managing backup IDs, job history, and
> logs is currently challenging, requiring substantial manual tracking and
> oversight to ensure consistency.
> • Dependency on YARN: The incremental backup process relies on a
> YARN cluster to move WALs, adding both resource dependency and complexity to
> the backup workflow.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)