ankitsol commented on code in PR #6788:
URL: https://github.com/apache/hbase/pull/6788#discussion_r2149561312
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########
@@ -149,12 +149,20 @@ public void execute() throws IOException {
try (Admin admin = conn.getAdmin()) {
beginBackup(backupManager, backupInfo);
+ // Gather the bulk loads being tracked by the system, which can be
deleted (since their data
+ // will be part of the snapshot being taken). We gather this list before
taking the actual
+ // snapshots for the same reason as the log rolls.
+ List<BulkLoad> bulkLoadsToDelete =
backupManager.readBulkloadRows(tableList);
Review Comment:
For 1) & 2) points I see 2 advantages with the coded approach in this PR.
First, this behaviour would be same for both continuous and non-continuous
incremental backup (ie using bulkload files from source cluster). Second, using
source cluster hfiles wrt bulkload operation instead of backup hfiles would
reduce processing time and cost of storage space of backup area. Backing up
bulkload operation would also delay backup of WALs since backing up WALs and
bulkload files are serial in execution.
Backing bulkload files idea was necessary when we were planning to use
WALPlayer with bulkload restore capability. Now I don't see any advantage of
backing up of bulkload files
3) BackupObserver.preCommitStoreFile() in invoked only for bulkload
operation so for bulkloaded hfiles only one time copy happens.
4) This code actually resolves a bug for properly handling of bulkload
operation, no modification of core logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]