[ 
https://issues.apache.org/jira/browse/HBASE-29220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944640#comment-17944640
 ] 

Vinayak Hegde commented on HBASE-29220:
---------------------------------------

h3. Overview

We are implementing a mechanism to compute the replication checkpoint timestamp 
used in continuous backups. This timestamp helps ensure that any data restored 
during Point-in-Time Restore (PITR) or incremental backups reflects a 
consistent and fully replicated state — i.e., no WAL entries prior to the 
checkpoint are missing from the target system.

This logic relies heavily on two core components of HBase's replication 
framework:
h3. What are WAL Replication Queues and the Replication Marker Chore?

1. WAL Replication Queues:
Each region server in HBase maintains a list of WALs that need to be replicated 
to a peer cluster. These are stored in ZooKeeper and are managed by the 
replication framework. The WALs in the queue represent the unreplicated or 
partially replicated data. The start timestamp of each WAL indicates the 
earliest possible point at which replication work may still be pending for that 
server.

2. Replication Marker Chore:
This is a background process that periodically scans the replication status and 
updates a marker table in HBase. For each region server, it writes the latest 
timestamp up to which data is known to be successfully replicated. This 
provides a higher-level summary of replication progress, independent of WAL 
structures.
h3. How We Use These in Our Approach

We combine both the WAL replication queues and the replication marker chore to 
calculate the replication checkpoint:
 * From WAL Queues:
We extract the start timestamps of the oldest WALs for each region server. 
These serve as a conservative lower bound — replication cannot be considered 
complete before these points.

 * From Marker Chore:
We read the last replicated timestamp per region server from the marker table. 
These can show more advanced replication progress, especially if WALs haven't 
been rolled recently.

 * Combining the Two:
For each region server:

 ** If it has both WAL and marker entries, and the marker timestamp is more 
recent, we use the marker (replication has moved beyond the WAL's start).

 ** If a marker exists but the region server no longer has WALs, we discard the 
marker — the server might be decommissioned or no longer relevant.

 ** If the marker chore is disabled, we fallback to WAL-only approach.

Finally, we compute the minimum of all relevant timestamps to get the 
checkpoint — ensuring we don’t skip over any unreplicated data.
h3. Why Use Both?

Relying on only one of the two sources creates problems:
 * WAL-Only Issues:
If a WAL is long-lived (not rolled for hours), the start timestamp remains 
stale, even if replication has actually progressed far ahead.
_Example:_ WAL {{WAL-001}} started at {{{}T1000{}}}. It’s still in the queue, 
but data has already been replicated up to {{{}T6000{}}}. If we only rely on 
WALs, the checkpoint gets stuck at {{{}T1000{}}}, unnecessarily delaying PITR.

 * Marker-Only Issues:
The marker chore may leave stale entries for decommissioned or reassigned 
servers.
_Example:_ {{rs1}} was handling WALs at {{{}T10{}}}. It gets decommissioned. 
Now {{rs2}} takes over the same data and has replicated it up to {{{}T20{}}}. 
However, {{{}rs1{}}}'s marker still says {{{}T10{}}}. If we blindly include 
that stale entry, the checkpoint incorrectly remains at {{{}T10{}}}, even 
though the system has moved forward.

So rather than being too aggressive, we are actually overly cautious — and 
intentionally so — to ensure correctness even if it means conservative (i.e., 
slower-moving) checkpoints.

That’s why we:
 * Use marker timestamps only if the server still has WALs (i.e., it's active),

 * Discard markers from servers not present in WAL queues (assuming they're no 
longer relevant),

 * Always take the minimum across valid sources to ensure no unreplicated data 
is skipped.

> Track the Age/Timestamp of the Last Successfully Backed-Up WAL Entry in 
> Continuous Backup Replication Endpoint
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29220
>                 URL: https://issues.apache.org/jira/browse/HBASE-29220
>             Project: HBase
>          Issue Type: Task
>          Components: backup&restore
>            Reporter: Vinayak Hegde
>            Priority: Major
>
> We use HBase’s replication framework for Continuous Backup through 
> {{{}ContinuousBackupReplicationEndpoint{}}}. This replicates WAL entries to 
> the backup location, which are then used for Point-In-Time Recovery (PITR) 
> and Incremental Backup (an optimization technique that collects WALs and 
> generates HFiles for faster recovery).
> However, the {{ReplicationEndpoint}} can lag behind in time.
> For example, if replication is one hour behind, 
> {{ContinuousBackupReplicationEndpoint}} will currently be writing WAL entries 
> that are one hour old. This means that if a user requests a PITR for the 
> current time or attempts an incremental backup, they will miss that one hour 
> of data.
> To prevent this, we need to ensure that users can only request data that has 
> been fully backed up. Therefore, we must track the timestamp of the last 
> successfully backed-up WAL entry:
>  * For PITR: Users should only be allowed to restore to a point before this 
> timestamp.
>  * For Incremental Backup: The incremental backup process should store this 
> timestamp as the backup time to maintain data consistency.
> This ensures data integrity and prevents users from requesting backups that 
> include unprocessed WAL entries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to