DieterDP-ng commented on code in PR #7216: URL: https://github.com/apache/hbase/pull/7216#discussion_r2294306835
########## src/main/asciidoc/_chapters/backup_restore.adoc: ########## @@ -804,16 +804,18 @@ providing a comparable level of security. This is a manual step which users *mus [[br.technical.details]] == Technical Details of Incremental Backup and Restore -HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore -solutions, such as those that only used HBase Export and Import APIs. Incremental backups use Write Ahead Logs (WALs) to capture -the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track -the WALs that need to be in the backup. +HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore solutions, such as those that only used HBase Export and Import APIs. +Incremental backups use Write Ahead Logs (WALs) to capture the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track the WALs that need to be in the backup. +In addition to WALs, incremental backups also track bulk-loaded HFiles for tables under backup. -After the incremental backup image is created, the source backup files usually are on same node as the data source. A process similar -to the DistCp (distributed copy) tool is used to move the source backup files to the target file systems. When a table restore operation -starts, a two-step process is initiated. First, the full backup is restored from the full backup image. Second, all WAL files from -incremental backups between the last full backup and the incremental backup being restored are converted to HFiles, which the HBase -Bulk Load utility automatically imports as restored data in the table. +Incremental backup gathers all WAL files generated since the last backup from the source cluster, +converts them to HFiles in a `.tmp` directory under the `BACKUP_ROOT`, and then moves these HFiles to their final location under the backup root directory to form the backup image. +It also reads bulk load records from the backup system table, forms the paths for the corresponding bulk-loaded HFiles, and copies those files to the backup destination. +This ensures bulk-loaded files are preserved and not deleted by cleaner chores before the backup completes. Review Comment: I'd rephrase this: `Bulk-loaded files are preserved (not deleted by cleaner chores) until they've been included in a backup (for each backup root).` ########## src/main/asciidoc/_chapters/backup_restore.adoc: ########## @@ -872,8 +874,10 @@ data at the full 80MB/s and `-w` is used to limit the job from spawning 16 worke Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost. -* Identify new write-ahead logs since last full or incremental backup: negligible. Apriori knowledge from the backup system table(s). +* Identify new write-ahead logs since the last full or incremental backup: negligible. Apriori knowledge from the backup system table(s). * Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed of writing data. Relative to write speed of HDFS. +* Read bulk load records from the backup system table, form the paths for bulk-loaded HFiles, and copy them to the backup destination. +These entries in the backup system table are not cleaned until the backup is marked complete to ensure the cleaner chore does not delete the files. Review Comment: This line can be scrapped in my opinion, it's already mentioned above, and it makes less sense to mention this in a "performance" section. ########## src/main/asciidoc/_chapters/backup_restore.adoc: ########## @@ -872,8 +874,10 @@ data at the full 80MB/s and `-w` is used to limit the job from spawning 16 worke Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost. -* Identify new write-ahead logs since last full or incremental backup: negligible. Apriori knowledge from the backup system table(s). +* Identify new write-ahead logs since the last full or incremental backup: negligible. Apriori knowledge from the backup system table(s). * Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed of writing data. Relative to write speed of HDFS. +* Read bulk load records from the backup system table, form the paths for bulk-loaded HFiles, and copy them to the backup destination. Review Comment: @hgromer - I think you can contribute a summary of HBASE-27659 to this part of the HBase docs. ########## src/main/asciidoc/_chapters/backup_restore.adoc: ########## @@ -804,16 +804,18 @@ providing a comparable level of security. This is a manual step which users *mus [[br.technical.details]] == Technical Details of Incremental Backup and Restore -HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore -solutions, such as those that only used HBase Export and Import APIs. Incremental backups use Write Ahead Logs (WALs) to capture -the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track -the WALs that need to be in the backup. +HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore solutions, such as those that only used HBase Export and Import APIs. +Incremental backups use Write Ahead Logs (WALs) to capture the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track the WALs that need to be in the backup. +In addition to WALs, incremental backups also track bulk-loaded HFiles for tables under backup. -After the incremental backup image is created, the source backup files usually are on same node as the data source. A process similar -to the DistCp (distributed copy) tool is used to move the source backup files to the target file systems. When a table restore operation -starts, a two-step process is initiated. First, the full backup is restored from the full backup image. Second, all WAL files from -incremental backups between the last full backup and the incremental backup being restored are converted to HFiles, which the HBase -Bulk Load utility automatically imports as restored data in the table. +Incremental backup gathers all WAL files generated since the last backup from the source cluster, +converts them to HFiles in a `.tmp` directory under the `BACKUP_ROOT`, and then moves these HFiles to their final location under the backup root directory to form the backup image. +It also reads bulk load records from the backup system table, forms the paths for the corresponding bulk-loaded HFiles, and copies those files to the backup destination. +This ensures bulk-loaded files are preserved and not deleted by cleaner chores before the backup completes. +A process similar to the DistCp (distributed copy) tool is used to move the backup files to the target file systems. Review Comment: Nit: `file system` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
