taklwu commented on code in PR #6788:
URL: https://github.com/apache/hbase/pull/6788#discussion_r2153681156
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/FullTableBackupClient.java:
##########
@@ -149,12 +149,20 @@ public void execute() throws IOException {
try (Admin admin = conn.getAdmin()) {
beginBackup(backupManager, backupInfo);
+ // Gather the bulk loads being tracked by the system, which can be
deleted (since their data
+ // will be part of the snapshot being taken). We gather this list before
taking the actual
+ // snapshots for the same reason as the log rolls.
+ List<BulkLoad> bulkLoadsToDelete =
backupManager.readBulkloadRows(tableList);
Review Comment:
so, maybe asking differently, is this a one-way approach for continuous
backup if we couple with the optimization of HBASE-29003 that reduce the
additional bulkload HFiles of the source cluster?
and without this change, or as @Kota-SH pointed about the if the source
cluster/directory is not accessible, which backup can we use ? especially can
we still have incremental recovery?
My two cents on this approach, building on top of HBASE-29003, is that it
seems reasonable. at least incremental backup already has this code change that
uses the source cluster/storage. I was wondered the feedback on the design docs
is also suggesting us to work closer with the logic of incremental backup, and
such we could avoid introducing similar logic but in fact is serving the same
thing.
Meanwhile, it's worth thinking of
a. the original plan that copies all the bulkloaded HFiles in between a
incremental backup was too slow, other than this approach, do we have any
alternative?
b. are we 100% against the continuous backup reads HFiles/bulkload HFiles
from the source storage? HBASE-29003 made a very good point about storage
usage, especially the HDFS use cases with 3 replicas.
---
point 3 and 4, I assumed @ankitsol already answered, so I don't have
comments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]