Jan Van Besien created HBASE-29905:
--------------------------------------

             Summary: BackupLogCleaner retains old WAL files due to stale 
entries in system:backup table
                 Key: HBASE-29905
                 URL: https://issues.apache.org/jira/browse/HBASE-29905
             Project: HBase
          Issue Type: Bug
          Components: backup&restore
            Reporter: Jan Van Besien


The backup:system table stores trslm: (table-region-server-log-map) rows with 
the row key format: {{trslm:\0}}

Each row's value is a protobuf-serialized map of {{\{RegionServer → WAL 
timestamp}}}

, representing the WAL position up to which each RegionServer has been backed 
up for that table.

BackupLogCleaner uses this information to decide what WAL files to cleanup, as 
follows:
 * During backup completion (FullTableBackupClient.java:192 / 
IncrementalTableBackupClient.java:330), writeRegionServerLogTimestamp() writes 
a trslm: row for each table in the backup, recording the latest WAL timestamp 
per RS.
 * Immediately after, readLogTimestampMap() (BackupSystemTable.java:802) scans 
all trslm: rows for that backup root — every table that has ever been backed up 
to that root, not just the tables in the current backup. This full map is 
stored into the BackupInfo object (backupInfo.setTableSetTimestampMap(...)) and 
persisted as part of the session: row in backup:system.
 * BackupLogCleaner (BackupLogCleaner.java:89-142) reads the most recent 
BackupInfo per backup root and iterates over its tableSetTimestampMap. For each 
RegionServer found across all tables, it computes the minimum timestamp as the 
"preservation boundary" for that server. WALs older than or equal to this 
boundary can be deleted; newer ones are retained. A single stale table with a 
year-old timestamp for any RS will pin WAL retention for that RS all the way 
back, preventing WAL cleanup.

The root cause is that there is no code anywhere that deletes trslm: rows. They 
are only written (overwritten) when a backup runs for that specific table. Two 
scenarios create stale rows:
 * (a) Table removed from backup (because the table is no longer included in 
backups or simple because the table is deleted).
 * (b) Regionserver decommissioned

Problem (a) was observed in production.

To fix this, I think we need to have a cleanup mechanism. Perhaps we can filter 
readLogTimestampMap() results to only include tables in the current backup 
info, and delete everything else (or only filter, without delete, but then the 
stale entries still remain in the table).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to