[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876814#action_12876814
 ] 

stack commented on HBASE-50:
----------------------------

.bq Are you talking about files that are no longer used by hbase table but are 
referenced by snapshot? I think this has been described in chapter 6 'Snapshot 
Maintenance'. For example, hfiles are archived in delete directory. And section 
6.4 describes how these files will be cleaned up.

That'll do for now.  I need to kick J-D so he's down with this change since 
he's going to want to have references to archived files doing his replications. 
 Let me get him to check this section out, make sure you two fellas are not 
fighting each other regards log file archiving.

.bq .META. entries for the snapshot table have been dumped, haven't they? Why 
we still need a snapshot of a subset of .META.?

You mean .regioninfo?  If so, thats fine, yes (as long as you ensure 
.regioninfo is up-to-date w/ snapshot).

.bq Yes, restore-from-snapshot has to run split over the WAL logs. It will take 
some time. So restore-from-snapshot will not be very fast.

We can work on speeding it up later, no problem.

.bq ...actually one list per RS, right?

Yes, this seems right since each RS will be responsible for snapshotting its 
portion of the total data.

.bq ...we should probably do real copy when restroing, right?

Yes, copy rather than rename I'd say.  We don't want to destroy the well of 
good hfiles, even if it is starting from a snapshot (this snapshot might be bad 
and we need to go back in time to earlier snapshots...etc.)

Hmmm... here is where your use of a Reference might actually come in handy.  If 
snapshot directory had all References under it, perhaps, we could start against 
the snapshot directory but immediately after startup, as we do for 
half-references, we'd work hard to undo the Reference by writing a new hfile 
from what is referenced... This would make it so we came up quickly.   I'd say 
this latter idea would be a nice-to-have.  

.bq Ok, I will rename the snapshot dir as '.snapshot'. For dir '.deleted', what 
name do you think we should use? Because there might be several snapshots under 
the dir '.snapshot', each has a snapshot name, I name this dir as '.deleted' to 
discriminate it from a snapshot name.

I'd say ignore my comment.  You have good reason for naming it w/ the '.'  
prefix.

.bq It'll be much more convenient if a catalog table 'snapshot' can be created. 
Will this impact normal operation of hbase?

it will in some regard; meta regions weill be bigger because they will now 
carry more data -- though it should be fairly small I'd say.  The extra data 
will bring on a .META. split the sooner.  We'll deal.    Having it in separate 
column family will make it so it doesn't get in the way during normal .META. 
accesses.  One problem though is that regions get deleted when there are no 
longer references to the a split parent.  Won't this mean you lose snapshot 
data?  Would this require you to keep snapshots in a table of its own?

,bq Or just copy the hfile into place and update the .META.? 

This latter would be better, but we can start simple at first... just run the 
main on the HLog script... pass it dir of WAL files to split.

I did not understand why master needed to be in the mix.  Now I understand its 
role taking care of offlined regions.  This sounds right.

I suppose you'll need to run a quick verification table is online.  There 
should be facility developed as part of fix for hbase-7 that should help here 
by the time you get to coding.

.bq ...If we switch to an asynchronous approach. Should the RS start snapshot 
immediately when it is ready?

I do not follow.  Please retry.




> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip
>
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to