ankitsol commented on code in PR #6788:
URL: https://github.com/apache/hbase/pull/6788#discussion_r2149561312
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########
@@ -149,12 +149,20 @@ public void execute() throws IOException {
try (Admin admin = conn.getAdmin()) {
beginBackup(backupManager, backupInfo);
+ // Gather the bulk loads being tracked by the system, which can be
deleted (since their data
+ // will be part of the snapshot being taken). We gather this list before
taking the actual
+ // snapshots for the same reason as the log rolls.
+ List<BulkLoad> bulkLoadsToDelete =
backupManager.readBulkloadRows(tableList);
Review Comment:
1) & 2) I see 2 advantages with the coded approach in this PR. First, this
behaviour would be same for both continuous and non-continuous incremental
backup (ie using bulkload files from source cluster). Second, using source
cluster hfiles wrt bulkload operation instead of backup hfiles would reduce
processing time and cost of storage space of backup area. Backing up bulkload
operation would also delay backup of WALs since backing up WALs and bulkload
files are serial in execution.
Backing bulkload files idea was necessary when we were planning to use
WALPlayer with bulkload restore capability. Now I don't see any advantage of
backing up of bulkload files
3) BackupObserver.preCommitStoreFile() in invoked only for bulkload
operation so for bulkloaded hfiles only one time copy happens.
4) This code actually resolves a bug for properly handling of bulkload
operation, no modification of core logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]