[
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017538#comment-14017538
]
Honghua Feng commented on HBASE-7912:
-------------------------------------
Just finished reading the design doc
"HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf". It's a good enhancement and
extension to current data backup/restore option/solution, and the design doc
reads quite concise and clear :-)
Some comments:
# "Use case example 1" in page 3: The full backup doesn't contain data of
table3 and table4, so when restoring table3 and table4, their data are all
restored from the incremental backups, right? Sounds it's not a typical
scenario(full-backup + incremental backups) for backup/restore.
# "4. Full Backup": Does log roll take place after taking (full) snapshot? What
if new writes arrive after taking snapshot but before log roll?
# "5. Incremental Backup": What if some RS fails during the log roll procedure
so that not all current log number are recorded onto ZooKeeper?
# What if some log files are archived/deleted between two incremental backups
and are not included in any incremental backup? Is it possible?
Some (possible) typos in the design doc:
# "2. Key features and Use Cases": "Full back uses HBase..." => "Full backup
uses HBase..."
# "5. Incremental Backup": "kicks of a global..." => "kicks off a global..."
# "5. Incremental Backup": "Incremental backups and also be..." => "Incremental
backups can also be..."
> HBase Backup/Restore Based on HBase Snapshot
> --------------------------------------------
>
> Key: HBASE-7912
> URL: https://issues.apache.org/jira/browse/HBASE-7912
> Project: HBase
> Issue Type: Sub-task
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf,
> HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and
> would like to share with community through this jira.
> We are leveraging existing hbase snapshot feature, and provide a general
> solution to common users. Our full backup is using snapshot to capture
> metadata locally and using exportsnapshot to move data to another cluster;
> the incremental backup is using offline-WALplayer to backup HLogs; we also
> leverage global distribution rolllog and flush to improve performance; other
> added-on values such as convert, merge, progress report, and CLI commands. So
> that a common user can backup hbase data without in-depth knowledge of hbase.
> Our solution also contains some usability features for enterprise users.
> The detail design document and CLI command will be attached in this jira. We
> plan to use 10~12 subtasks to share each of the following features, and
> document the detail implement in the subtasks:
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental
> backup)
> * *distributed* Logroll and distributed flush
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup
> * *Convert* incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-------------------------------------------------------------------------------------------------------------*
> *Below is the original description, to keep here as the history for the
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase
> backup/restore solution (e.g., HBASE-4618). Recently, there are many
> advancements and new features in HBase, for example, FileLink, Snapshot, and
> Distributed Barrier Procedure. This is a proposal for a backup/restore
> solution that utilizes these new features to achieve better performance and
> consistency.
>
> A common practice of backup and restore in database is to first take full
> baseline backup, and then periodically take incremental backup that capture
> the changes since the full baseline backup. HBase cluster can store massive
> amount data. Combination of full backups with incremental backups has
> tremendous benefit for HBase as well. The following is a typical scenario
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase.
> # The user schedules periodical incremental backups to capture the changes
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).
> Then the incremental backups that are up to the desired point in time are
> applied on top of the full backup.
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same
> cluster or across clusters. It has the flexibility to support backup to
> other devices and servers in the future.
--
This message was sent by Atlassian JIRA
(v6.2#6252)