[ 
https://issues.apache.org/jira/browse/HBASE-28562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843008#comment-17843008
 ] 

Ray Mattingly commented on HBASE-28562:
---------------------------------------

Yes, we've experience huge backup manifests due to some bugs in the 
getAncestors call, and its underlying BackupManifest#canCoverImage method.

The BackupManifest#canCoverImage method specifies that its fullImages parameter 
is intended to only be full backup images, not incremental. Its name implies 
this, and [a comment makes that 
clear|https://github.com/apache/hbase/blob/2c3abae18aa35e2693b64b143316817d4569d0c3/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManifest.java#L614]:

"each image of fullImages must not be an incremental image"

But we pass in all ancestors, including incremental images, to this method. For 
example: 
[https://github.com/apache/hbase/blob/6b672cc0717e762ecaad203714099b962c035ef0/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java#L320]

And the BackupManifest#canCoverImage does not assert the precondition well — 
instead of throwing an IllegalArgumentException, it proceeds and will just 
return false [if any of the given ancestors are incremental 
backups|https://github.com/apache/hbase/blob/2c3abae18aa35e2693b64b143316817d4569d0c3/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManifest.java#L619]!
 This means that, once an incremental backup ancestor has been found, all 
subsequent backup images will also be considered ancestors and this will 
balloon the backup manifest size. This could also be a factor in why checking 
the entirety of backup history is problematic for you. We probably need to 
largely refactor getAncestors and/or canCoverImage

> Ancestor calculation of backups is wrong
> ----------------------------------------
>
>                 Key: HBASE-28562
>                 URL: https://issues.apache.org/jira/browse/HBASE-28562
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>    Affects Versions: 2.6.0, 3.0.0
>            Reporter: Dieter De Paepe
>            Priority: Major
>              Labels: pull-request-available
>
> This is the same issue as HBASE-25870, but I think the fix there was wrong.
> This issue can prevent creation of (incremental) backups when data of 
> unrelated backups was damaged on backup storage.
> Minimal example to reproduce from source:
>  * Add following to `conf/hbase-site.xml` to enable backups:
> {code:java}
> <property>
>     <name>hbase.backup.enable</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>hbase.master.logcleaner.plugins</name>
>     
> <value>org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveProcedureWALCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveMasterLocalStoreWALCleaner,org.apache.hadoop.hbase.backup.master.BackupLogCleaner</value>
>   </property>
>   <property>
>     <name>hbase.procedure.master.classes</name>
>     
> <value>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager</value>
>   </property>
>   <property>
>     <name>hbase.procedure.regionserver.classes</name>
>     
> <value>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager</value>
>   </property>
>   <property>
>   <name>hbase.coprocessor.region.classes</name>
>   <value>org.apache.hadoop.hbase.backup.BackupObserver</value>
> </property>
>   <property>
>     <name>hbase.fs.tmp.dir</name>
>     <value>file:/tmp/hbase-tmp</value>
>   </property> {code}
>  * Start HBase and open a shell: {{{}bin/start-hbase.sh{}}}, {{bin/hbase 
> shell}}
>  * Execute following commands ("put" & "create" commands in hbase shell, 
> other commands in commandline):
>  * 
> {code:java}
> create 'experiment', 'fam' 
> put 'experiment', 'row1', 'fam:b', 'value1'
> bin/hbase backup create full file:/tmp/hbasebackup
> Backup session backup_1714649896776 finished. Status: SUCCESS
> put 'experiment', 'row2', 'fam:b', 'value2'
> bin/hbase backup create incremental file:/tmp/hbasebackup
> Backup session backup_1714649920488 finished. Status: SUCCESS
> put 'experiment', 'row3', 'fam:b', 'value3'
> bin/hbase backup create incremental file:/tmp/hbasebackup
> Backup session backup_1714650054960 finished. Status: SUCCESS
> (Delete the files corresponding to the first incremental backup - 
> backup_1714649920488 in this example)
> put 'experiment', 'row4', 'fam:a', 'value4'
> bin/hbase backup create full file:/tmp/hbasebackup
> Backup session backup_1714650236911 finished. Status: SUCCESS
> put 'experiment', 'row5', 'fam:a', 'value5'
> bin/hbase backup create incremental file:/tmp/hbasebackup
> Backup session backup_1714650289957 finished. Status: SUCCESS
> put 'experiment', 'row6', 'fam:a', 'value6'
> bin/hbase backup create incremental 
> file:/tmp/hbasebackup2024-05-02T13:45:27,534 ERROR [main {}] 
> impl.BackupManifest: file:/tmp/hbasebackup/backup_1714649920488 does not exist
> 2024-05-02T13:45:27,534 ERROR [main {}] impl.TableBackupClient: Unexpected 
> Exception : file:/tmp/hbasebackup/backup_1714649920488 does not exist
> org.apache.hadoop.hbase.backup.impl.BackupException: 
> file:/tmp/hbasebackup/backup_1714649920488 does not exist
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManifest.<init>(BackupManifest.java:451)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManifest.<init>(BackupManifest.java:402)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManager.getAncestors(BackupManager.java:331)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManager.getAncestors(BackupManager.java:353)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.TableBackupClient.addManifest(TableBackupClient.java:286)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.TableBackupClient.completeBackup(TableBackupClient.java:351)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:314)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:603)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:345)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169) 
> ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199) 
> ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) 
> ~[hadoop-common-3.3.5.jar:?]
>     at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177) 
> ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
> 2024-05-02T13:45:27,538 ERROR [main {}] impl.TableBackupClient: 
> BackupId=backup_1714650324099,startts=1714650324486,failedts=1714650327538,failedphase=STORE_MANIFEST,failedmessage=file:/tmp/hbasebackup/backup_1714649920488
>  does not exist
> 2024-05-02T13:45:28,763 ERROR [main {}] impl.TableBackupClient: Backup 
> backup_1714650324099 failed.
> Backup session finished. Status: FAILURE
> 2024-05-02T13:45:28,764 ERROR [main {}] backup.BackupDriver: Error running 
> command-line tool
> java.io.IOException: org.apache.hadoop.hbase.backup.impl.BackupException: 
> file:/tmp/hbasebackup/backup_1714649920488 does not exist
>     at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:319)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:603)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:345)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169) 
> ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199) 
> ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) 
> ~[hadoop-common-3.3.5.jar:?]
>     at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177) 
> ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
> Caused by: org.apache.hadoop.hbase.backup.impl.BackupException: 
> file:/tmp/hbasebackup/backup_1714649920488 does not exist
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManifest.<init>(BackupManifest.java:451)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManifest.<init>(BackupManifest.java:402)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManager.getAncestors(BackupManager.java:331)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.BackupManager.getAncestors(BackupManager.java:353)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.TableBackupClient.addManifest(TableBackupClient.java:286)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.TableBackupClient.completeBackup(TableBackupClient.java:351)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:314)
>  ~[hbase-backup-2.6.1-SNAPSHOT.jar:2.6.1-SNAPSHOT]
>     ... 7 more{code}
> Currently working on a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to