[ 
https://issues.apache.org/jira/browse/HBASE-29497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013059#comment-18013059
 ] 

Vinayak Hegde commented on HBASE-29497:
---------------------------------------

Thanks, [~dieterdp_ng] , for pointing that out.

I also noticed one more thing — in section 11, the incremental backup and 
restore steps are not described correctly.

Actually:
 * We collect the WAL files from the last backup in the source cluster.

 * We convert those WAL files to HFiles and copy them to the backup location 
during the incremental backup phase.

 * We then bulk load those HFiles in the restore phase.

But in that section, the "Convert those WAL files to HFiles" step is combined 
with the restore phase instead of the incremental backup phase. Maybe we should 
update the wording there?

How about this:
{code:java}
HBase incremental backups enable more efficient capture of HBase table images 
than previous attempts at serial backup and restore solutions, such as those 
that only used HBase Export and Import APIs. Incremental backups use Write 
Ahead Logs (WALs) to capture the data changes since the previous backup was 
created. A WAL roll (create new WALs) is executed across all RegionServers to 
track the WALs that need to be in the backup.

Incremental backup gathers all WAL files generated since the last backup from 
the source cluster, converts them to HFiles in a `.tmp` directory under the 
`BACKUP_ROOT`, and then moves these HFiles to their final location under the 
backup root directory to form the backup image. A process similar to the DistCp 
(distributed copy) tool is used to move the backup files to the target file 
systems.

When a table restore operation starts, a two-step process is initiated. First, 
the full backup is restored from the full backup image. Second, all HFiles from 
incremental backups between the last full backup and the incremental backup 
being restored are bulk loaded into the table using the HBase Bulk Load utility.

You can only restore on a live HBase cluster because the data must be 
redistributed to complete the restore operation successfully. {code}

> Mention HFiles for incremental backups
> --------------------------------------
>
>                 Key: HBASE-29497
>                 URL: https://issues.apache.org/jira/browse/HBASE-29497
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Dieter De Paepe
>            Priority: Minor
>
> Section 11 and 13.2 fail to mention that incremental backups also track 
> bulk-loaded HFiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to