rmdmattingly commented on code in PR #5868:
URL: https://github.com/apache/hbase/pull/5868#discussion_r1608768506
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java:
##########
@@ -295,51 +298,31 @@ public ArrayList<BackupImage> getAncestors(BackupInfo
backupInfo) throws IOExcep
.withRootDir(backup.getBackupRootDir()).withTableList(backup.getTableNames())
.withStartTime(backup.getStartTs()).withCompleteTime(backup.getCompleteTs()).build();
- // Only direct ancestors for a backup are required and not entire
history of backup for this
- // table resulting in verifying all of the previous backups which is
unnecessary and backup
- // paths need not be valid beyond the lifetime of a backup.
- //
- // RootDir is way of grouping a single backup including one full and
many incremental backups
+ // If the image has a different rootDir, it cannot be an ancestor.
if (!image.getRootDir().equals(backupInfo.getBackupRootDir())) {
continue;
}
- // add the full backup image as an ancestor until the last incremental
backup
- if (backup.getType().equals(BackupType.FULL)) {
- // check the backup image coverage, if previous image could be covered
by the newer ones,
- // then no need to add
- if (!BackupManifest.canCoverImage(ancestors, image)) {
- ancestors.add(image);
- }
+ // The ancestors consist of the most recent FULL backups that cover the
list of tables
+ // required in the new backup and all INCREMENTAL backups that came
after one of those FULL
+ // backups.
+ if (backup.getType().equals(BackupType.INCREMENTAL)) {
Review Comment:
I wrote up [this
test](https://github.com/HubSpot/hbase/compare/hubspot-2.6...HubSpot:hbase:inc-backup-tablesets)
which only proves your point that incremental backups ignore the tableset
input — ideally, from my perspective, this test would fail at
```java
// contains all 200 rows??? why/how/are they correct
assertEquals(rowCount, 200);
```
or the API wouldn't accept a set of tables to backup
As for why I'm seeing different backup contents at my day job, it's a good
question and probably not as easy to reproduce in a unit test. I'm picking up a
years long backup system refactor after a colleague's departure, so I probably
won't have a quick answer there either, but will continue to dig.
All this to say, you're correct here and I want to reiterate the
improvements I mentioned above, so I think we should ship these changes 🚢
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]