[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

Demai Ni (JIRA) Tue, 13 May 2014 16:52:28 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996599#comment-13996599
 ]


Demai Ni commented on HBASE-7912:
---------------------------------

[~stack], 

thanks for the comments. 

bq. This doc. with perhaps a little more commentary like it could go into the 
hbase refguide when this feature is committed?
In additional to the cli pdf I attached in this jira. more completed documents 
can be found here:  [IBM BigInsights 
2.1.2|http://www-01.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_hbase_bkuprestore_overview.html],
 which was officially released in March 2014. We will open source all the 
features related with Backup/Restore from IBM BigInsights. We can move the 
documents to 'backup' session of HBase ref book as you suggested, and certainly 
after incorporated the comments/suggestions from the community.

About testing, thanks to [~jinghe]'s comment. We already did functional, stress 
testing internally before release. For the current patches, since we did some 
changes per suggestions from the community, additional dev testing is being 
carried on. 

{quote}
bq. We’ll convert/replay the backed-up Hlogs into HFiles for fast incremental 
restore. 
This is interesting. It is done against a cluster or it is just a MR job/tool?
{quote}
~70% of the code logic is from WalPlayer, a MR job against target cluster. The 
difference is, we don't rely on a live hbase cluster when convert the HLog to 
Hfiles as the code can access the tableinfo offline. Currently the code is only 
useful for the backup/restore solution. We'd like to open another jira for the 
logic as a general tool/improvement of WalPlayer, and the new jira will have a 
dependency on [HBASE-8083 | https://issues.apache.org/jira/browse/HBASE-8073]. 

bq.What needs to go in first? What should we review first?
Actually, need you and other folks' suggestion here. 

>From the dependency perspective, I'd like to have [Full backup HBase-10900| 
>https://issues.apache.org/jira/browse/HBASE-10900] in first, and then 
>[incremental backup 
>HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085], and once 
>Jerry's [global log roll HBase-11148| 
>https://issues.apache.org/jira/browse/HBASE-11148] get accepted. I will put a 
>patch to update full and incremental to use it immediately.  Then, I would 
>like to improve it with protobuff and abstract out zookeeper. 

If community accepts the solution of the general framework provided by [Full 
backup HBase-10900| https://issues.apache.org/jira/browse/HBASE-10900] and  
[incremental backup 
HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085]. We will build 
the patches of other features on top of the framework. 

At this moment, I am thinking about open another review board for the combined 
patches of [both incremental and full backup | 
https://issues.apache.org/jira/secure/attachment/12644215/HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch].
 

I understand a lot of codes involved here, and open to any suggestion to make 
the review easier to everyone. :-) 

Demai

> HBase Backup/Restore Based on HBase Snapshot
> --------------------------------------------
>
>                 Key: HBASE-7912
>                 URL: https://issues.apache.org/jira/browse/HBASE-7912
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-------------------------------------------------------------------------------------------------------------*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and 
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same 
> cluster or across clusters.  It has the flexibility to support backup to 
> other devices and servers in the future.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

Reply via email to