ankitsol commented on code in PR #6788:
URL: https://github.com/apache/hbase/pull/6788#discussion_r2149561312


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########
@@ -149,12 +149,20 @@ public void execute() throws IOException {
     try (Admin admin = conn.getAdmin()) {
       beginBackup(backupManager, backupInfo);
 
+      // Gather the bulk loads being tracked by the system, which can be 
deleted (since their data
+      // will be part of the snapshot being taken). We gather this list before 
taking the actual
+      // snapshots for the same reason as the log rolls.
+      List<BulkLoad> bulkLoadsToDelete = 
backupManager.readBulkloadRows(tableList);

Review Comment:
   1) & 2) I see 2 advantages with the coded approach in this PR. First, this 
behaviour would be same for both continuous and non-continuous incremental 
backup (ie using bulkload files from source cluster). Second, using source 
cluster hfiles wrt bulkload operation instead of backup hfiles would reduce 
processing time and cost of storage space of backup area. Backing up bulkload 
operation would also delay backup of WALs since backing up WALs and bulkload 
files are serial in execution.
   
   Backing bulkload files idea was necessary when we were planning to use 
WALPlayer with bulkload restore capability. Now I don't see any advantage of 
backing up of bulkload files
   
   3) BackupObserver.preCommitStoreFile() in invoked only for bulkload 
operation so for bulkloaded hfiles only one time copy happens.
   
   4) This code actually resolves a bug for properly handling of bulkload 
operation, no modification of core logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to