[
https://issues.apache.org/jira/browse/HBASE-29261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944425#comment-17944425
]
Vinayak Hegde edited comment on HBASE-29261 at 4/22/25 12:08 PM:
-----------------------------------------------------------------
h2. Objective of New Logic
Instead of checking if a backup is the _only_ valid one for all PITR points, we
aim to:
* *Identify the exact PITR time range* that a particular full backup can
support as a base.
* If *another full backup* also covers that same range, then the current
backup is *redundant* and can be deleted.
h2. Terminologies
* {{ct}} (current time): Current timestamp
* {{mapt}} (maxAllowedPITRTime): Maximum time range to which the user can
perform PITR. Typically configured at cluster level (e.g., 30 days ago)
* {{cst}} (continuousBackupStartTime): The earliest time from which continuous
backup data is available for a table
* {{{}fs{}}}: Full backup start time (not reliably usable)
* {{{}fm{}}}: Time when the actual snapshot (logical freeze) was taken —
slightly later than {{fs}} (we don’t have access to this)
* {{{}fe{}}}: Full backup end time (we will use this instead of {{fm}} for
conservative estimation)
h2. Assumptions
* A full backup captures the table data as of {{{}fm{}}}, which lies somewhere
after {{{}fs{}}}.
* After the snapshot is taken ({{{}fm{}}}), data may be added to the table
even before the backup finishes ({{{}fe{}}}). But since the snapshot freezes
the state, the backup only contains data up to {{{}fm{}}}.
* *Since we don’t have access to {{{}fm{}}}, we will conservatively use {{fe}}
to determine the lower bound of data that the backup might reliably support.*
h2. Case Analysis
*Timeline & Notation Explanation*
Timeline
{code:java}
cst mapt ct
--------------------|--------------------|----------------------------------|
{code}
Backup Notation:
{code:java}
|---|------|
^ ^ ^
| | |
fs fm fe{code}
----
h4. *Case 1: {{continuousBackupStartTime < maxAllowedPITRTime}}*
This means we have continuous backup coverage even before the PITR time limit.
In this case, the PITR window spans from {{mapt}} to {{{}ct{}}}.
{code:java}
cst mapt ct
--------------------|--------------------|----------------------------------|
a. |--|----|
b. |--|----|
c. |--|----|
d. |--|----|
e. |--|----|
f. |--|----|
g. |--|----| {code}
We evaluate different full backups based on their {{{}fe{}}}:
* If {{{}fe < cst{}}}: The backup does not cover any PITR points. After
restoring this backup, the table is at a point ({{{}fm{}}}) before {{{}cst{}}}.
Since continuous backup is only available from {{{}cst{}}}, we cannot reach any
point in the PITR range.
* If {{{}cst <= fe < mapt{}}}: Continuous backup exists from {{{}cst{}}}, so
we can recover from {{mapt}} onwards, but not before. Hence, this backup can
support {{{}[mapt, ct]{}}}.
* If {{{}fe >= mapt{}}}: We can recover from {{fe}} onwards, so this backup
supports {{{}[fe, ct]{}}}.
*Conclusion for Case 1:*
* Let {{effectiveStart = mapt}}
* If {{{}fe >= cst{}}}: PITR range supported = {{[max(fe, effectiveStart),
ct]}}
* Else: Backup does not cover any PITR points
----
h4. *Case 2: {{maxAllowedPITRTime < continuousBackupStartTime}}*
Here, the user has configured PITR for a time range earlier than we have data
for. So even though the system allows recovery up to {{{}ct - 30 days{}}}, we
only have continuous backup data from a later time ({{{}cst{}}}), and that
limits what is actually recoverable.
{code:java}
mapt cst ct
--------------------|--------------------|----------------------------------|
a. |--|----|
b. |--|----|
c. |--|----|
d. |--|----|
e. |--|----|
f. |--|----|
g. |--|----| {code}
* Since PITR recovery is only possible from {{{}cst{}}}, {{mapt}} becomes
irrelevant.
* The effective PITR window is {{[cst, ct]}}
* If {{{}fe >= cst{}}}: PITR range supported = {{[fe, ct]}}
* Else: No recovery possible with this backup
*Conclusion for Case 2:*
* Let {{effectiveStart = cst}}
* If {{{}fe >= effectiveStart{}}}: PITR range supported = {{[max(fe,
effectiveStart), ct]}}
* Else: Backup does not cover any PITR points
----
h4. *Case 3: {{continuousBackupStartTime == maxAllowedPITRTime}}*
This is effectively the same as Case 2 in behavior, with the recovery window
starting from a single defined time.
{code:java}
cst,mapt ct
-------------------------------|----------------------------------|
a. |--|----|
b. |--|----|
c. |--|----|
d. |--|----| {code}
* If {{{}fe >= cst{}}}: PITR range supported = {{[fe, ct]}}
* Else: No PITR support
*Conclusion for Case 3:*
* Let {{effectiveStart = cst}} (which is equal to {{{}mapt{}}})
* If {{{}fe >= effectiveStart{}}}: PITR range supported = {{[max(fe,
effectiveStart), ct]}}
* Else: Backup does not cover any PITR points
h2. Final General Rule (Applicable in All Cases)
{code:java}
effectiveStart = max(cst, mapt)
If fe >= cst:
PITR range = [max(fe, effectiveStart), ct]
Else:
PITR range = None (backup does not help with PITR) {code}
was (Author: JIRAUSER298877):
h2. Objective of New Logic
Instead of checking if a backup is the _only_ valid one for all PITR points, we
aim to:
* *Identify the exact PITR time range* that a particular full backup can
support as a base.
* If *another full backup* also covers that same range, then the current
backup is *redundant* and can be deleted.
h2. Terminologies
* {{ct}} (current time): Current timestamp
* {{mapt}} (maxAllowedPITRTime): Maximum time range to which the user can
perform PITR. Typically configured at cluster level (e.g., 30 days ago)
* {{cst}} (continuousBackupStartTime): The earliest time from which continuous
backup data is available for a table
* {{{}fs{}}}: Full backup start time (not reliably usable)
* {{{}fm{}}}: Time when the actual snapshot (logical freeze) was taken —
slightly later than {{fs}} (we don’t have access to this)
* {{{}fe{}}}: Full backup end time (we will use this instead of {{fm}} for
conservative estimation)
h2. Assumptions
* A full backup captures the table data as of {{{}fm{}}}, which lies somewhere
after {{{}fs{}}}.
* After the snapshot is taken ({{{}fm{}}}), data may be added to the table
even before the backup finishes ({{{}fe{}}}). But since the snapshot freezes
the state, the backup only contains data up to {{{}fm{}}}.
* *Since we don’t have access to {{{}fm{}}}, we will conservatively use {{fe}}
to determine the lower bound of data that the backup might reliably support.*
h2. Case Analysis
*Timeline & Notation Explanation*
Timeline
{code:java}
cst mapt ct
--------------------|--------------------|----------------------------------|
{code}
Backup Notation:
{code:java}
|---|------|
^ ^ ^
| | |
fs fm fe{code}
----
h4. *Case 1: {{continuousBackupStartTime < maxAllowedPITRTime}}*
This means we have continuous backup coverage even before the PITR time limit.
In this case, the PITR window spans from {{mapt}} to {{{}ct{}}}.
{code:java}
cst mapt ct
--------------------|--------------------|----------------------------------|
a. |--|----|
b. |--|----|
c. |--|----|
d. |--|----|
e. |--|----|
f. |--|----|
g. |--|----| {code}
We evaluate different full backups based on their {{{}fe{}}}:
* If {{{}fe < cst{}}}: The backup does not cover any PITR points. After
restoring this backup, the table is at a point ({{{}fm{}}}) before {{{}cst{}}}.
Since continuous backup is only available from {{{}cst{}}}, we cannot reach any
point in the PITR range.
* If {{{}cst <= fe < mapt{}}}: Continuous backup exists from {{{}cst{}}}, so
we can recover from {{mapt}} onwards, but not before. Hence, this backup can
support {{{}[mapt, ct]{}}}.
* If {{{}fe >= mapt{}}}: We can recover from {{fe}} onwards, so this backup
supports {{{}[fe, ct]{}}}.
*Conclusion for Case 1:*
* Let {{effectiveStart = mapt}}
* If {{{}fe >= effectiveStart{}}}: PITR range supported = {{[max(fe,
effectiveStart), ct]}}
* Else: Backup does not cover any PITR points
----
h4. *Case 2: {{maxAllowedPITRTime < continuousBackupStartTime}}*
Here, the user has configured PITR for a time range earlier than we have data
for. So even though the system allows recovery up to {{{}ct - 30 days{}}}, we
only have continuous backup data from a later time ({{{}cst{}}}), and that
limits what is actually recoverable.
{code:java}
mapt cst ct
--------------------|--------------------|----------------------------------|
a. |--|----|
b. |--|----|
c. |--|----|
d. |--|----|
e. |--|----|
f. |--|----|
g. |--|----| {code}
* Since PITR recovery is only possible from {{{}cst{}}}, {{mapt}} becomes
irrelevant.
* The effective PITR window is {{[cst, ct]}}
* If {{{}fe >= cst{}}}: PITR range supported = {{[fe, ct]}}
* Else: No recovery possible with this backup
*Conclusion for Case 2:*
* Let {{effectiveStart = cst}}
* If {{{}fe >= effectiveStart{}}}: PITR range supported = {{[max(fe,
effectiveStart), ct]}}
* Else: Backup does not cover any PITR points
----
h4. *Case 3: {{continuousBackupStartTime == maxAllowedPITRTime}}*
This is effectively the same as Case 2 in behavior, with the recovery window
starting from a single defined time.
{code:java}
cst,mapt ct
-------------------------------|----------------------------------|
a. |--|----|
b. |--|----|
c. |--|----|
d. |--|----| {code}
* If {{{}fe >= cst{}}}: PITR range supported = {{[fe, ct]}}
* Else: No PITR support
*Conclusion for Case 3:*
* Let {{effectiveStart = cst}} (which is equal to {{{}mapt{}}})
* If {{{}fe >= effectiveStart{}}}: PITR range supported = {{[max(fe,
effectiveStart), ct]}}
* Else: Backup does not cover any PITR points
h2. Final General Rule (Applicable in All Cases)
{code:java}
effectiveStart = max(cst, mapt)
If fe >= effectiveStart:
PITR range = [max(fe, effectiveStart), ct]
Else:
PITR range = None (backup does not help with PITR) {code}
> Investigate flaw in backup deletion validation of PITR-critical backups and
> propose correct approach
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-29261
> URL: https://issues.apache.org/jira/browse/HBASE-29261
> Project: HBase
> Issue Type: Task
> Components: backup&restore
> Reporter: Vinayak Hegde
> Assignee: Vinayak Hegde
> Priority: Major
>
> This Jira investigates a flaw in our current logic used to validate whether a
> full backup—potentially critical for PITR (Point-In-Time Recovery)—can be
> safely deleted.
> The current approach incorrectly checks whether a full backup is the only
> valid base for *all* PITR target points, which is not a valid criterion. A
> full backup should not be required to support _all_ PITR points to be
> considered necessary. Instead, each full backup only contributes to a
> {*}specific PITR time range{*}, depending on when the backup was taken and
> the availability of continuous backups afterward.
> This ticket proposes a more accurate and conservative approach:
> * Determine the PITR range each full backup can support.
> * Identify if another full backup exists that fully covers the same range.
> * If such a backup exists, the original one can be considered safe for
> deletion.
> All edge cases and reasoning are explained in the comments for clarity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)