Bryan Beaudreault created HBASE-28423:
-----------------------------------------
Summary: Improvements to backup of bulkloaded files
Key: HBASE-28423
URL: https://issues.apache.org/jira/browse/HBASE-28423
Project: HBase
Issue Type: Improvement
Reporter: Bryan Beaudreault
Backup/Restore has support for including bulkloaded files in incremental
backups. There is a coprocessor hook which registers all bulkloads into a
backup:system_bulk table. A cleaner plugin ensures that these files are not
cleaned up from the archive until they are backed up. When the incremental
backup occurs, the files are deleted from the system_bulk table and then
cleaned up.
We have encountered two problems to be solved with this:
# The deletion process only happens during incremental backups, not full
backups. A full backup already includes all data in the table via a snapshot
export. So we should clear any pending bulkloads upon full backup.
# There is currently no linking of bulkload state to backupRoot. It's possible
to have multiple backupRoots for tables. For example, you might backup to 2
destinations with different schedules. Currently whichever backupRoot does an
incremental backup first will be the one to include bulkloads, then the
system_bulk table. We need some sort of mapping of bulkload to backupRoot, and
we should only delete the rows from system_bulk once the files have been
included in all active backupRoots.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)