[ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017973#comment-14017973
 ] 

Demai Ni commented on HBASE-7912:
---------------------------------

[~fenghh], many thanks for the comments 

{quote}
"Use case example 1" in page 3: The full backup doesn't contain data of table3 
and table4, so when restoring table3 and table4, their data are all restored 
from the incremental backups, right? Sounds it's not a typical 
scenario(full-backup + incremental backups) for backup/restore.
{quote}
during step c. ".. user adds other table.." this actually triggers an implicite 
full backup for table 3 and table 4. So when restore them in the future, the 
data will come both full and incremental backup. 

{quote}
"4. Full Backup": Does log roll take place after taking (full) snapshot? What 
if new writes arrive after taking snapshot but before log roll?
{quote} 
the logic is to take log roll first and then snapshot. if new writes arrive in 
between, it will be saved in the full backup image. And the same writes will be 
saved again in the next incremental backup. The approach is to ensure no data 
loss by allowing duplicate puts during restore. 

{quote}
"5. Incremental Backup": What if some RS fails during the log roll procedure so 
that not all current log number are recorded onto ZooKeeper?
{quote}
in such case, the backup process will abort, and the clean up logic is the same 
as [HBASE-11172 cancel a backup process | 
https://issues.apache.org/jira/browse/HBASE-11172]. The code will remove the 
incomplete backup image and roll back zookeeper state to the previous backup. 

{quote} 
What if some log files are archived/deleted between two incremental backups and 
are not included in any incremental backup? Is it possible?
{quote} 
Good point. (also thanks to [~mbertozzi], who pointed out the same problem 
earlier). There is a log cleaner that hasn't been included in the patch yet. It 
is called BackupLogCleaner extended from BaseLogCleanerDelegate, as part of 
hbase.master.logcleaner.plugins. It would keep the logs. The side-effect would 
be (if user don't do incremental too often) too much log files left. We have a 
stop -all feature to remove all backup tables, also will free up the logs. 

Thanks for pointing out the typo. I will fix them up in the doc. 


> HBase Backup/Restore Based on HBase Snapshot
> --------------------------------------------
>
>                 Key: HBASE-7912
>                 URL: https://issues.apache.org/jira/browse/HBASE-7912
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-------------------------------------------------------------------------------------------------------------*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and 
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same 
> cluster or across clusters.  It has the flexibility to support backup to 
> other devices and servers in the future.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to