[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878446#action_12878446
]
Todd Lipcon commented on HBASE-50:
----------------------------------
Finally had a chance to look over the doc. Great to see such a thorough writeup
on the plan! A couple thoughts:
- Snapshot creation: rather than causing all of the RS to roll the logs, they
could simply record the log sequence number of the snapshot, right? This will
be a bit faster to do and causes even less of a "hiccup" in concurrent
operations (and I don't think it's any more complicated to implement, is it?)
- Snapshot restore: I do think it's a good idea to allow snapshot restore to a
new table name while the original table is still online. And the restored
snapshot should be able to share HFiles with the original table
- Making the client orchestrate the snapshot process seems a little strange -
could the client simply initiate it and put the actual snapshot code in the
master? I think we should keep the client as thin as we can (in the future we
may want to implement clients natively in other languages)
- I'd be interested in a section about failure analysis - what happens when the
snapshot coordinator fails in the middle? You briefly touched on this, but
would be good just to enumerate the different points where a failure can happen
and show that the operation is correctly aborted and that you don't end up with
an HFile "reference leak"
> Snapshot of table
> -----------------
>
> Key: HBASE-50
> URL: https://issues.apache.org/jira/browse/HBASE-50
> Project: HBase
> Issue Type: New Feature
> Reporter: Billy Pearson
> Assignee: Li Chongxin
> Priority: Minor
> Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot
> Design Report V3.pdf, snapshot-src.zip
>
>
> Havening an option to take a snapshot of a table would be vary useful in
> production.
> What I would like to see this option do is do a merge of all the data into
> one or more files stored in the same folder on the dfs. This way we could
> save data in case of a software bug in hadoop or user code.
> The other advantage would be to be able to export a table to multi locations.
> Say I had a read_only table that must be online. I could take a snapshot of
> it when needed and export it to a separate data center and have it loaded
> there and then i would have it online at multi data centers for load
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect
> from failed servers, but this does not protect use from software bugs that
> might delete or alter data in ways we did not plan. We should have a way we
> can roll back a dataset.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.