The backup tool takes snapshots of HFiles on a per-region basis. Before copying anything, we flush the region and then list all its files at that time. If we can successfully copy a region, we assume that all its files are consistent for that region because they are immutable. If we can't successfully copy an entire region, then it is failed and later retried. Knowing this, the snapshots of the HFiles may be of different times for different regions. So with the backup tool alone, we can't guarantee consistent table snapshots. This is why we also use WALPlayer, which takes care of replaying logs until the time we wish to restore to.
Odds of something not recoverable? That's a very good question. So far we haven't had a non-recoverable backup with our most current version. But honestly, I don't know yet. We've only recently completed this tool and are still testing it. With my limited knowledge of HBase, it's likely that I missed something, and it's one of the reasons we are releasing it, so anyone interested can test it out and verify or break its logic, and suggest improvements. In fact, I also included the algorithm in our github wiki page for this reason. Feel free to review it for accuracy. In the mean time, I'll work on a better answer for this :) Hopefully, we can find out soon. Thanks Carlos -----Original Message----- From: Vladimir Rodionov [mailto:[email protected]] Sent: Tuesday, May 08, 2012 7:38 PM To: [email protected] Subject: RE: HBase backup option Carlos, How did you achieve consistency? Are table snapshots consistent? If not, what are the odds to get something not recoverable? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [email protected] ________________________________________ From: Espinoza,Carlos [[email protected]] Sent: Tuesday, May 08, 2012 1:46 PM To: [email protected] Subject: HBase backup option Hi, I was asked to mention this on the mailing list. At OCLC, we are working towards moving our data to HBase. A huge requirement is to backup our data, obviously, and seeing that this is still a work in progress, we decided to write something for ourselves. Using all the resources that we found available on Jira, the HBase book, etc., we came up with a few tools that we are currently testing and using. We were able to upload it to github, so here it is https://github.com/oclc/HBase-Backup So far, they have been great for us. And if anyone is interested in giving it a try, please go ahead, any feedback would be greatly appreciated. Thanks! Carlos Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [email protected] and delete or destroy any copy of this message and its attachments.
