vinayakphegde commented on code in PR #6788:
URL: https://github.com/apache/hbase/pull/6788#discussion_r2149351693


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########
@@ -149,12 +149,20 @@ public void execute() throws IOException {
     try (Admin admin = conn.getAdmin()) {
       beginBackup(backupManager, backupInfo);
 
+      // Gather the bulk loads being tracked by the system, which can be 
deleted (since their data
+      // will be part of the snapshot being taken). We gather this list before 
taking the actual
+      // snapshots for the same reason as the log rolls.
+      List<BulkLoad> bulkLoadsToDelete = 
backupManager.readBulkloadRows(tableList);

Review Comment:
   Okay, I have a few questions:
   1. Why is this needed? Can’t we get the bulk loaded files from the backup 
location, just like we do with WAL files? Why not handle bulkload files the 
same way?
   2. Also, one thing I noticed — by default, when backup is enabled in the 
cluster, we start accumulating all HFiles (both store files and bulkloaded 
files) using BackupObserver. These files are registered under bulkLoadTableName 
to prevent cleaner chores from deleting them during incremental backups, etc. 
But in our case, we don’t want this behavior, right? The whole idea is to back 
up files into the backup location, not retain them on the source cluster. I 
think we should enable BackupObserver only for the traditional (non-continuous) 
backup path. Otherwise, we’re unnecessarily accumulating files in the source 
cluster.
   3. I also noticed this:
   - In [this 
code](https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java#L290-L292),
 we read all WAL files since the last backup, convert them to HFiles, and store 
them in the backup location.
   - Again, in [this 
line](https://github.com/apache/hbase/blob/master/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java#L319),
 we’re reading bulkloaded files from the table and backing them up.
   - However, the list of bulkloaded files also includes regular store files — 
as seen 
[here](https://github.com/apache/hbase/blob/28c757d45aa1797bbee6892433aec16a460e765e/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java#L75)
 — which means we’re potentially processing normal HFiles twice?
   4. Regarding: “I added here because it was necessary for the bulkload test I 
added”. -- ideally, we shouldn’t modify core logic only to support a specific 
test case. It might be better to adapt the test to the intended behavior.
   
   @anmolnar @kgeisz @taklwu — please share your thoughts as well.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to