[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876330#action_12876330
 ] 

stack commented on HBASE-50:
----------------------------

.bq ...What I mean here is, snapshot should be put in class HBaseAdmin instead 
of HTable. 

OK.  I misunderstood.

.bq 2, ...just make sure the table data (hfiles indeed) are not mutated when 
snapshot.

Yes... but also after snapshot is done.... your design should include 
description of how files are archived, rather than deleted, when no longer 
needed post-snapshot.  Design should also describe how a snapshot-restore tool 
will know where to find files that have been put aside, rather than deleted, in 
archives.

.bq I'm not sure wether RS or the master should take on the responsibility to 
perform the snapshot at this time. 

I'd say just forge ahead with the snapshot.   Snapshot will be doing same thing 
whether table is partially online or not methinks?  If you can, warn user that 
the snapshot is of a table that is partially online but not important.  A 
partially-offlined table can be fixed up post snapshot restore.  Its outside 
the scope of your issue doing work to figure whether table is healthy or not, 
I'd say.

.bq Another LogCleanerDelegate, say ReferencedLogCleaner, could be created to 
check whether the log file should be deleted for the consideration of snapshot. 
What do you think?

Sounds good.

I need to review J-D's replication.  I can add note that he needs to be 
consious that others will want a say on when files are cleaned up.

Regards snapshot of -ROOT-, don't worry about it.  There is nothing in there.  
Regards snapshot of .META., that should be possible.  In fact you'll probably 
be doing a snapshot of at least a subset of .META. on every table snapshot I'd 
imagine -- at least the entries for the relevant table.

.bq It's a synchronous way. Do you think this is appropriate? 

Yes.  I'm w/ JG on this.

.bq Do you mean a znode is create for each RS to keep the progress? 

OK. Lets not do this for now.  Its the kinda thing implementation will bring 
out.  At implementation time you may find you need it... but hopefully the 
snapshot runs so fast, there'll be no need of the intermediary.




> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip
>
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to