[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880093#action_12880093
 ] 

Li Chongxin commented on HBASE-50:
----------------------------------

bq. Fail with a warning. A nice-to-have would be your suggestion of restoring 
snapshot into a table named something other than the original table's name 
(Fixing this issue is low-priority IMO).
bq. .. it's a good idea to allow snapshot restore to a new table name while the 
original table is still online. And the restored snapshot should be able to 
share HFiles with the original table

I will make this issue a low-priority sub-task. One more question, besides 
metadata and log file, what else data should take care to rename the snapshot 
to a new table name? Are there any other files (e.g. HFiles) containing table 
name?

bq. ... didn't we discuss that .META. might not be the place to keep snapshot 
data since regions are deleted when the system is done w/ them (but a snapshot 
may outlive a particular region).

I misunderstood... I thought you were talking about create a new catalog table 
'snapshot' to keep the metadata of snapshots, such as creation time.
In current design, a region will not be delete if it is still used by a 
snapshot, even if the system has done with it. This region would be probably 
marked as 'deleted' in .META. This is discussed in section 6.2, 6.3 and no new 
catalog table is added. Do you think it is appropriate to keep metadata in 
.META. for a deleted region? Do we still need a new catalog table?

bq. rather than causing all of the RS to roll the logs, they could simply 
record the log sequence number of the snapshot, right? This will be a bit 
faster to do and causes even less of a "hiccup" in concurrent operations (and I 
don't think it's any more complicated to implement, is it?)

Yes, sounds good. The log sequence number should also be included when the logs 
are split because log files would contain the data both before and after the 
snapshot, right?

bq. Making the client orchestrate the snapshot process seems a little strange - 
could the client simply initiate it and put the actual snapshot code in the 
master? I think we should keep the client as thin as we can

Ok, This will change the design a little.

bq. I'd be interested in a section about failure analysis - what happens when 
the snapshot coordinator fails in the middle? ..

That will be great!

> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
> Design Report V3.pdf, snapshot-src.zip
>
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to