[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636725#action_12636725 ]
Clint Morgan commented on HBASE-50: ----------------------------------- I've been thinking about this recently. I'd like to be able to take a snapshot backup of all of our tables, and a requirement here is that the snapshot be consistent. What this mean with respect to transactions, is that either none or all of a transaction makes it into the snapshot (atomicity) Another requirement is to minimize the time we have to be read-only as much as possible. I'd like to keep in on the order of a few seconds. My first thinking was along the lines of what Stack's suggested above: Go to read-only, flush, then copy the files. As I understand it, I could go back to allowing writes as soon as the memcache flush begins. The subsequent writes would just go to memory.... Then I realized that once we have proper appending to the write-ahead-log (HLog), then I can simply copy that log over rather than doing the memcahe flush. So I was thinking it would work roughly like this: (I use the term message generically here. Originally I was thinking this could all be orchestrated by passing around HMsgs with the normal mechanism, but now I think it would be better to do it with explicit RPC calls to speed things up.) - Master sends RegionServers a BeginSnapshot message - RegionServers recieve BeginSnapshot and put thier regions into read-only mode, and prevent flushes/compactions/splits. - Commit-pending transactions (EG, transactions which we have voted to commit, but not committed yet) for a region are allowed to finish. This is needed to ensure atomicity. The time that transactions are commit-pending should be very small. - After all commit-pending transactions have completed, the Region move the write ahead logger to a new file. The old one(s) will be copied in the snapshot. When all regions in a RegionServer are ready, it sends a CopyOk message to the Master. This means that our hdfs files are ready to be copied. - After all RegionServers have sent the CopyOk message, the Master sends a WritesOk message to all regionServers, and begins the HDFS copy. - When Regions get the WritesOK message, they can allow writes to the memcache and new WAL. (If they need to spill to disk then we have to handle that specially. Either abort the snapshot, or spill to something that won't be included in the snapshot) - After the hdfs copy is done, then the Master sends a SnapshotComplete message. This tells the RegionServers that they can start spilling to disk again. So how does this sound? It seems I can avoid the memcache flush if I really trust my WAL. And it seems I should be able to keep the read-only time fairly low. Any problems I'm not seeing? > Snapshot of table > ----------------- > > Key: HBASE-50 > URL: https://issues.apache.org/jira/browse/HBASE-50 > Project: Hadoop HBase > Issue Type: New Feature > Reporter: Billy Pearson > Priority: Minor > > Havening an option to take a snapshot of a table would be vary useful in > production. > What I would like to see this option do is do a merge of all the data into > one or more files stored in the same folder on the dfs. This way we could > save data in case of a software bug in hadoop or user code. > The other advantage would be to be able to export a table to multi locations. > Say I had a read_only table that must be online. I could take a snapshot of > it when needed and export it to a separate data center and have it loaded > there and then i would have it online at multi data centers for load > balancing and failover. > I understand that hadoop takes the need out of havening backup to protect > from failed servers, but this does not protect use from software bugs that > might delete or alter data in ways we did not plan. We should have a way we > can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.