[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876699#action_12876699
 ] 

Li Chongxin commented on HBASE-50:
----------------------------------

bq. ... but also after snapshot is done.... your design should include 
description of how files are archived, rather than deleted...

Are you talking about files that are no longer used by hbase table but are 
referenced by snapshot? I think this has been described in chapter 6 'Snapshot 
Maintenance'. For example, hfiles are archived in delete directory. And section 
6.4 describes how these files will be cleaned up.

bq. ..In fact you'll probably be doing a snapshot of at least a subset of 
.META. on every table snapshot I'd imagine - at least the entries for the 
relevant table.

.META. entries for the snapshot table have been dumped, haven't they? Why we 
still need a snapshot of a subset of .META.?

bq. So, do you foresee your restore-from-snapshot running split over the logs 
as part of the restore? That makes sense to me.

Yes, restore-from-snapshot has to run split over the WAL logs. It will take 
some time. So restore-from-snapshot will not be very fast.

bq. Why you think we need a Reference to the hfile? Why not just a file that 
lists the names of all the hfiles? We don't need to execute the snapshot, do 
we? Restoring from a snapshot would be a bunch of file renames and wal 
splitting?

At first I thought snapshot probably should keep the table directory structure 
for the later use. For example, a reader like HalfStoreFileReader could be 
provided so that we could read from the snapshot directly. But yes, we actually 
don't execute the snapshot. So keeping a list of all the hfiles (actually one 
list per RS, right?) should be enough. And also restroing from snapshot is not 
just file renames. Since a hfile might be referenced by several snapshot, we 
should probably do real copy when restroing, right?

bq. Shall we name the new .META. column family snapshot rather than reference?

sure

bq. On the filename '.deleted', I think it a mistake to give it a '.' prefix 
especially given its in the snapshot dir...

Ok, I will rename the snapshot dir as '.snapshot'. For dir '.deleted', what 
name do you think we should use? Because there might be several snapshots under 
the dir '.snapshot', each has a snapshot name, I name this dir as '.deleted' to 
discriminate it from a snapshot name.

bq. Do you need a new catalog table called snapshots to keep list of snapshots, 
of what a snapshot comprises and some other metadata such as when it was made, 
whether it succeeded, who did it and why?

It'll be much more convenient if a catalog table 'snapshot' can be created. 
Will this impact normal operation of hbase?

bq. Section 7.4 is missing split of WAL files. Perhaps this can be done in a MR 
job? 

I'll add the split of WAL logs. Yes, a MR job can be used. Which method do you 
think is better? Read from the imported file and inserted into the table by 
hbase api. Or just copy the hfile into place and update the .META.?

bq. Lets not have the master run the snapshot... let the client run it?
bq. Snapshot will be doing same thing whether table is partially online or not..

I put these two issues together because I think they are correlative. In 
current design, if a table is opened, snapshot will be performed by each RS 
which serves tha table regions. Otherwise, if a table is closed, snapshot will 
be performed by the master because the table is not served by any RS. For the 
first comment, it is talking about closed table. So master will perform the 
snapshot because client does not have access to underlying dfs. For the second 
one, I was thinking if a table is partially online, table regions might be 
partially served by RS and partially offline, right? Then who will perform the 
snapshot? If RS, the regions that are offline will be missed. If the master, 
regions that are online might lose data in memstore. I'm confused..

bq. It's a synchronous way. Do you think this is appropriate? Yes. I'm w/ JG on 
this.

This is another problem confusing me..In current design (which is a synchronous 
way), a snapshot is started when all the RS are ready for snapshot. Then all RS 
perform snapshot concurrently. This guarantees snapshot is not started if one 
RS fails. If we switch to an asynchronous approach. Should the RS start 
snapshot immediately when it is ready?

> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip
>
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to