Dieter De Paepe created HBASE-29604: ---------------------------------------
Summary: BackupHFileCleaner uses flawed time based check Key: HBASE-29604 URL: https://issues.apache.org/jira/browse/HBASE-29604 Project: HBase Issue Type: Bug Components: backup&restore Affects Versions: 3.0.0-beta-1, 2.6.0, 4.0.0-alpha-1 Reporter: Dieter De Paepe BackupHFileCleaner is responsible for preventing the cleanup of bulkloaded HFiles that are still required by the backup & restore mechanism. It does this using 2 checks: * The backupsystemtable stores which HFile bulk loads are required for the next incremental backup. Any HFile present here cannot be deleted. * A time-based check is present to avoid recently created HFiles from being deleted. The intention is to avoid deletion of HFiles newer than the previous run of the cleaner. I believe is to avoid race conditions between the cleaner and entries in the backupsystemtable that get created while the cleaner is running. In a single-threaded context, this works correctly. However, the cleaner is actually used concurrently in the hfile_cleaner-dir-scan-pool to scan multiple subdirectories in `CleanerChore#traverseAndDelete` (line 492). This means the time-based check is not guaranteed to protect recently created HFiles. This has a (small) chance to cause data loss (in a backup) if an HFile is wrongfully deleted. I also strongly suggest to add a mention to FileCleanerDelegate that implementations should be thread-safe. -- This message was sent by Atlassian Jira (v8.20.10#820010)