taklwu commented on code in PR #6788:
URL: https://github.com/apache/hbase/pull/6788#discussion_r2153681156


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########
@@ -149,12 +149,20 @@ public void execute() throws IOException {
     try (Admin admin = conn.getAdmin()) {
       beginBackup(backupManager, backupInfo);
 
+      // Gather the bulk loads being tracked by the system, which can be 
deleted (since their data
+      // will be part of the snapshot being taken). We gather this list before 
taking the actual
+      // snapshots for the same reason as the log rolls.
+      List<BulkLoad> bulkLoadsToDelete = 
backupManager.readBulkloadRows(tableList);

Review Comment:
   so, maybe asking differently, is this a one-way approach for continuous 
backup if we couple with the optimization of HBASE-29003 that reduce the 
additional bulkload HFiles of the source cluster? 
   
   and without this change, or as @Kota-SH pointed about the if the source 
cluster/directory is not accessible, which backup can we use ? especially can 
we still have incremental recovery? 
   
   My two cents on this approach, building on top of HBASE-29003, is that it 
seems reasonable. at least incremental backup already has this code change that 
uses the source cluster/storage. I was wondered the feedback on the design docs 
is also suggesting us to work closer with the logic of incremental backup, and 
such we could avoid introducing similar logic but in fact is serving the same 
thing. 
   
   Meanwhile, it's worth thinking of 
   a. the original plan that copies all the bulkloaded HFiles in between a 
incremental backup was too slow, other than this approach, do we have any 
alternative? 
   b. are we 100% against the continuous backup reads HFiles/bulkload HFiles 
from the source storage? HBASE-29003 made a very good point about storage 
usage, especially the HDFS use cases with 3 replicas. 
   
   ---
   
   point 3 and 4, I assumed @ankitsol  already answered, so I don't have 
comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to