[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778724#action_12778724 ]
Alex Newman commented on HBASE-50: ---------------------------------- Say you flushed the logs and then ran a compaction and waited for the cluster to chill out. Unless you had extremely high churn rates I would suggest: A mapreduce with a region per task which fails and retries in case of flushes or compactions. In the case of a split you can fail the job, disable splitting, or have some way of getting the children later. Even though you would have kindof data throughout a time period, you would at least have a timestamp of when that backup was made. I.E. consistency on a region level which is all a lot of us really want. > Snapshot of table > ----------------- > > Key: HBASE-50 > URL: https://issues.apache.org/jira/browse/HBASE-50 > Project: Hadoop HBase > Issue Type: New Feature > Reporter: Billy Pearson > Assignee: Alex Newman > Priority: Minor > > Havening an option to take a snapshot of a table would be vary useful in > production. > What I would like to see this option do is do a merge of all the data into > one or more files stored in the same folder on the dfs. This way we could > save data in case of a software bug in hadoop or user code. > The other advantage would be to be able to export a table to multi locations. > Say I had a read_only table that must be online. I could take a snapshot of > it when needed and export it to a separate data center and have it loaded > there and then i would have it online at multi data centers for load > balancing and failover. > I understand that hadoop takes the need out of havening backup to protect > from failed servers, but this does not protect use from software bugs that > might delete or alter data in ways we did not plan. We should have a way we > can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.