Jan Van Besien created HBASE-29905:
--------------------------------------
Summary: BackupLogCleaner retains old WAL files due to stale
entries in system:backup table
Key: HBASE-29905
URL: https://issues.apache.org/jira/browse/HBASE-29905
Project: HBase
Issue Type: Bug
Components: backup&restore
Reporter: Jan Van Besien
The backup:system table stores trslm: (table-region-server-log-map) rows with
the row key format: {{trslm:\0}}
Each row's value is a protobuf-serialized map of {{\{RegionServer → WAL
timestamp}}}
, representing the WAL position up to which each RegionServer has been backed
up for that table.
BackupLogCleaner uses this information to decide what WAL files to cleanup, as
follows:
* During backup completion (FullTableBackupClient.java:192 /
IncrementalTableBackupClient.java:330), writeRegionServerLogTimestamp() writes
a trslm: row for each table in the backup, recording the latest WAL timestamp
per RS.
* Immediately after, readLogTimestampMap() (BackupSystemTable.java:802) scans
all trslm: rows for that backup root — every table that has ever been backed up
to that root, not just the tables in the current backup. This full map is
stored into the BackupInfo object (backupInfo.setTableSetTimestampMap(...)) and
persisted as part of the session: row in backup:system.
* BackupLogCleaner (BackupLogCleaner.java:89-142) reads the most recent
BackupInfo per backup root and iterates over its tableSetTimestampMap. For each
RegionServer found across all tables, it computes the minimum timestamp as the
"preservation boundary" for that server. WALs older than or equal to this
boundary can be deleted; newer ones are retained. A single stale table with a
year-old timestamp for any RS will pin WAL retention for that RS all the way
back, preventing WAL cleanup.
The root cause is that there is no code anywhere that deletes trslm: rows. They
are only written (overwritten) when a backup runs for that specific table. Two
scenarios create stale rows:
* (a) Table removed from backup (because the table is no longer included in
backups or simple because the table is deleted).
* (b) Regionserver decommissioned
Problem (a) was observed in production.
To fix this, I think we need to have a cleanup mechanism. Perhaps we can filter
readLogTimestampMap() results to only include tables in the current backup
info, and delete everything else (or only filter, without delete, but then the
stale entries still remain in the table).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)