Dieter De Paepe created HBASE-29604:
---------------------------------------

             Summary: BackupHFileCleaner uses flawed time based check
                 Key: HBASE-29604
                 URL: https://issues.apache.org/jira/browse/HBASE-29604
             Project: HBase
          Issue Type: Bug
          Components: backup&restore
    Affects Versions: 3.0.0-beta-1, 2.6.0, 4.0.0-alpha-1
            Reporter: Dieter De Paepe


BackupHFileCleaner is responsible for preventing the cleanup of bulkloaded 
HFiles that are still required by the backup & restore mechanism. It does this 
using 2 checks:
 * The backupsystemtable stores which HFile bulk loads are required for the 
next incremental backup. Any HFile present here cannot be deleted.
 * A time-based check is present to avoid recently created HFiles from being 
deleted. The intention is to avoid deletion of HFiles newer than the previous 
run of the cleaner. I believe is to avoid race conditions between the cleaner 
and entries in the backupsystemtable that get created while the cleaner is 
running.

In a single-threaded context, this works correctly.

However, the cleaner is actually used concurrently in the 
hfile_cleaner-dir-scan-pool to scan multiple subdirectories in 
`CleanerChore#traverseAndDelete` (line 492). This means the time-based check is 
not guaranteed to protect recently created HFiles. This has a (small) chance to 
cause data loss (in a backup) if an HFile is wrongfully deleted.

I also strongly suggest to add a mention to FileCleanerDelegate that 
implementations should be thread-safe.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to