[ 
https://issues.apache.org/jira/browse/HBASE-29800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-29800:
-----------------------------------
    Labels: pull-request-available  (was: )

> WAL logs are unprotected during first full backup
> -------------------------------------------------
>
>                 Key: HBASE-29800
>                 URL: https://issues.apache.org/jira/browse/HBASE-29800
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>            Reporter: Dieter De Paepe
>            Priority: Major
>              Labels: pull-request-available
>
> There is a small window during the creation of the first full backup in the 
> first/only backup root where WAL logs might be eligible for deletion, which 
> could lead to data loss for incremental backups in the following backups.
> Pseudo code for this scenario is as follows (see 
> FullTableBackupClient#execute):
> {code:java}
> // This is our first backup. Let's put some marker to system table so that we 
> can hold the
> // logs while we do the backup.
> backupManager.writeBackupStartCode(0L);
> // Roll the WALs
> BackupUtils.logRoll(...);
> snapshotAndCopyTables();
> backupManager.writeBackupStartCode(newStartCode);
> // Register the backupInfo as completed
> completeBackup(...);{code}
> The comment of the "0" backupStartCode suggests that it prevents WAL deletion 
> until the backup is completed, but this is not the case.
> The component responsible for preventing WAL deletion for backups is 
> BackupLogCleaner. While the log cleaner does read & use the backup start 
> codes, it only does so for backups that are already completed:
> {code:java}
> // true means only include completed backups
> List<BackupInfo> backups = sysTable.getBackupHistory(true); {code}
> So the log cleaner will not even be aware of the backup root.
> I believe this means there is a risk of data loss in the following 
> incremental backup when a table, after it has been snapshotted but before the 
> backup is completed, performs a log roll and the log cleaner activates.
> Simplest fix is probably to have the log cleaner also use in-progress 
> backupInfos to calculate the startCode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to