[
https://issues.apache.org/jira/browse/HBASE-29800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HBASE-29800:
-----------------------------------
Labels: pull-request-available (was: )
> WAL logs are unprotected during first full backup
> -------------------------------------------------
>
> Key: HBASE-29800
> URL: https://issues.apache.org/jira/browse/HBASE-29800
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Reporter: Dieter De Paepe
> Priority: Major
> Labels: pull-request-available
>
> There is a small window during the creation of the first full backup in the
> first/only backup root where WAL logs might be eligible for deletion, which
> could lead to data loss for incremental backups in the following backups.
> Pseudo code for this scenario is as follows (see
> FullTableBackupClient#execute):
> {code:java}
> // This is our first backup. Let's put some marker to system table so that we
> can hold the
> // logs while we do the backup.
> backupManager.writeBackupStartCode(0L);
> // Roll the WALs
> BackupUtils.logRoll(...);
> snapshotAndCopyTables();
> backupManager.writeBackupStartCode(newStartCode);
> // Register the backupInfo as completed
> completeBackup(...);{code}
> The comment of the "0" backupStartCode suggests that it prevents WAL deletion
> until the backup is completed, but this is not the case.
> The component responsible for preventing WAL deletion for backups is
> BackupLogCleaner. While the log cleaner does read & use the backup start
> codes, it only does so for backups that are already completed:
> {code:java}
> // true means only include completed backups
> List<BackupInfo> backups = sysTable.getBackupHistory(true); {code}
> So the log cleaner will not even be aware of the backup root.
> I believe this means there is a risk of data loss in the following
> incremental backup when a table, after it has been snapshotted but before the
> backup is completed, performs a log roll and the log cleaner activates.
> Simplest fix is probably to have the log cleaner also use in-progress
> backupInfos to calculate the startCode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)