[
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Rodionov updated HBASE-7912:
-------------------------------------
Attachment: HBaseBackupAndRestore - v0.8.pdf
Added command line tool section.
> HBase Backup/Restore Based on HBase Snapshot
> --------------------------------------------
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
> Issue Type: Sub-task
> Reporter: Richard Ding
> Assignee: Vladimir Rodionov
> Labels: backup
> Fix For: 2.0.0
>
> Attachments: HBaseBackupAndRestore - v0.8.pdf,
> HBaseBackupAndRestore.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf,
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf,
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf,
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and
> would like to share with community through this jira.
> We are leveraging existing hbase snapshot feature, and provide a general
> solution to common users. Our full backup is using snapshot to capture
> metadata locally and using exportsnapshot to move data to another cluster;
> the incremental backup is using offline-WALplayer to backup HLogs; we also
> leverage global distribution rolllog and flush to improve performance; other
> added-on values such as convert, merge, progress report, and CLI commands. So
> that a common user can backup hbase data without in-depth knowledge of hbase.
> Our solution also contains some usability features for enterprise users.
> The detail design document and CLI command will be attached in this jira. We
> plan to use 10~12 subtasks to share each of the following features, and
> document the detail implement in the subtasks:
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental
> backup)
> * *distributed* Logroll and distributed flush
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup
> * *Convert* incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-------------------------------------------------------------------------------------------------------------*
> *Below is the original description, to keep here as the history for the
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase
> backup/restore solution (e.g., HBASE-4618). Recently, there are many
> advancements and new features in HBase, for example, FileLink, Snapshot, and
> Distributed Barrier Procedure. This is a proposal for a backup/restore
> solution that utilizes these new features to achieve better performance and
> consistency.
>
> A common practice of backup and restore in database is to first take full
> baseline backup, and then periodically take incremental backup that capture
> the changes since the full baseline backup. HBase cluster can store massive
> amount data. Combination of full backups with incremental backups has
> tremendous benefit for HBase as well. The following is a typical scenario
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase.
> # The user schedules periodical incremental backups to capture the changes
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).
> Then the incremental backups that are up to the desired point in time are
> applied on top of the full backup.
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same
> cluster or across clusters. It has the flexibility to support backup to
> other devices and servers in the future.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)