[jira] Commented: (HBASE-50) Snapshot of table

Clint Morgan (JIRA) Fri, 03 Oct 2008 13:10:36 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636725#action_12636725
 ]


Clint Morgan commented on HBASE-50:
-----------------------------------

I've been thinking about this recently. I'd like to be able to take a
snapshot backup of all of our tables, and a requirement
here is that the snapshot be consistent. What this mean with respect
to transactions, is that either none or all of a transaction makes it
into the snapshot (atomicity)

Another requirement is to minimize the time we have to be read-only as
much as possible. I'd like to keep in on the order of a few seconds.

My first thinking was along the lines of what Stack's suggested above:
Go to read-only, flush, then copy the files. As I understand it, I
could go back to allowing writes as soon as the memcache flush
begins. The subsequent writes would just go to memory....

Then I realized that once we have proper appending to the
write-ahead-log (HLog), then I can simply copy that log over rather
than doing the memcahe flush.

So I was thinking it would work roughly like this: (I use the term
message generically here. Originally I was thinking this could all be
orchestrated by passing around HMsgs with the normal mechanism, but
now I think it would be better to do it with explicit RPC calls to
speed things up.)


- Master sends RegionServers a BeginSnapshot message

- RegionServers recieve BeginSnapshot and put thier regions into
  read-only mode, and prevent flushes/compactions/splits. 

- Commit-pending transactions (EG, transactions which we have voted to
  commit, but not committed yet) for a region are allowed to
  finish. This is needed to ensure atomicity. The time that
  transactions are commit-pending should be very small.

- After all commit-pending transactions have completed, the Region
  move the write ahead logger to a new file. The old one(s) will be
  copied in the snapshot. When all regions in a RegionServer are
  ready, it sends a CopyOk message to the Master. This means that our
  hdfs files are ready to be copied.

- After all RegionServers have sent the CopyOk message, the
  Master sends a WritesOk message to all regionServers, and begins the HDFS 
copy.

- When Regions get the WritesOK message, they can allow writes to the
  memcache and new WAL. (If they need to spill to disk then we have to handle 
that
  specially. Either abort the snapshot, or spill to something that
  won't be included in the snapshot)

- After the hdfs copy is done, then the Master sends a
  SnapshotComplete message. This tells the RegionServers that they can
  start spilling to disk again.


So how does this sound? It seems I can avoid the memcache flush if I
really trust my WAL. And it seems I should be able to keep the
read-only time fairly low. Any problems I'm not seeing?


> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Priority: Minor
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-50) Snapshot of table

Reply via email to