[
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996599#comment-13996599
]
Demai Ni commented on HBASE-7912:
---------------------------------
[~stack],
thanks for the comments.
bq. This doc. with perhaps a little more commentary like it could go into the
hbase refguide when this feature is committed?
In additional to the cli pdf I attached in this jira. more completed documents
can be found here: [IBM BigInsights
2.1.2|http://www-01.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_hbase_bkuprestore_overview.html],
which was officially released in March 2014. We will open source all the
features related with Backup/Restore from IBM BigInsights. We can move the
documents to 'backup' session of HBase ref book as you suggested, and certainly
after incorporated the comments/suggestions from the community.
About testing, thanks to [~jinghe]'s comment. We already did functional, stress
testing internally before release. For the current patches, since we did some
changes per suggestions from the community, additional dev testing is being
carried on.
{quote}
bq. We’ll convert/replay the backed-up Hlogs into HFiles for fast incremental
restore.
This is interesting. It is done against a cluster or it is just a MR job/tool?
{quote}
~70% of the code logic is from WalPlayer, a MR job against target cluster. The
difference is, we don't rely on a live hbase cluster when convert the HLog to
Hfiles as the code can access the tableinfo offline. Currently the code is only
useful for the backup/restore solution. We'd like to open another jira for the
logic as a general tool/improvement of WalPlayer, and the new jira will have a
dependency on [HBASE-8083 | https://issues.apache.org/jira/browse/HBASE-8073].
bq.What needs to go in first? What should we review first?
Actually, need you and other folks' suggestion here.
>From the dependency perspective, I'd like to have [Full backup HBase-10900|
>https://issues.apache.org/jira/browse/HBASE-10900] in first, and then
>[incremental backup
>HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085], and once
>Jerry's [global log roll HBase-11148|
>https://issues.apache.org/jira/browse/HBASE-11148] get accepted. I will put a
>patch to update full and incremental to use it immediately. Then, I would
>like to improve it with protobuff and abstract out zookeeper.
If community accepts the solution of the general framework provided by [Full
backup HBase-10900| https://issues.apache.org/jira/browse/HBASE-10900] and
[incremental backup
HBase-11085|https://issues.apache.org/jira/browse/HBASE-11085]. We will build
the patches of other features on top of the framework.
At this moment, I am thinking about open another review board for the combined
patches of [both incremental and full backup |
https://issues.apache.org/jira/secure/attachment/12644215/HBASE-11085-trunk-v1-contains-HBASE-10900-trunk-v4.patch].
I understand a lot of codes involved here, and open to any suggestion to make
the review easier to everyone. :-)
Demai
> HBase Backup/Restore Based on HBase Snapshot
> --------------------------------------------
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
> Issue Type: Sub-task
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf,
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and
> would like to share with community through this jira.
> We are leveraging existing hbase snapshot feature, and provide a general
> solution to common users. Our full backup is using snapshot to capture
> metadata locally and using exportsnapshot to move data to another cluster;
> the incremental backup is using offline-WALplayer to backup HLogs; we also
> leverage global distribution rolllog and flush to improve performance; other
> added-on values such as convert, merge, progress report, and CLI commands. So
> that a common user can backup hbase data without in-depth knowledge of hbase.
> Our solution also contains some usability features for enterprise users.
> The detail design document and CLI command will be attached in this jira. We
> plan to use 10~12 subtasks to share each of the following features, and
> document the detail implement in the subtasks:
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental
> backup)
> * *distributed* Logroll and distributed flush
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup
> * *Convert* incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-------------------------------------------------------------------------------------------------------------*
> *Below is the original description, to keep here as the history for the
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase
> backup/restore solution (e.g., HBASE-4618). Recently, there are many
> advancements and new features in HBase, for example, FileLink, Snapshot, and
> Distributed Barrier Procedure. This is a proposal for a backup/restore
> solution that utilizes these new features to achieve better performance and
> consistency.
>
> A common practice of backup and restore in database is to first take full
> baseline backup, and then periodically take incremental backup that capture
> the changes since the full baseline backup. HBase cluster can store massive
> amount data. Combination of full backups with incremental backups has
> tremendous benefit for HBase as well. The following is a typical scenario
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase.
> # The user schedules periodical incremental backups to capture the changes
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).
> Then the incremental backups that are up to the desired point in time are
> applied on top of the full backup.
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same
> cluster or across clusters. It has the flexibility to support backup to
> other devices and servers in the future.
--
This message was sent by Atlassian JIRA
(v6.2#6252)