[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853491#action_12853491
 ] 

stack commented on HBASE-50:
----------------------------

@ Li Chongxin Let me add to Todd's great comments.

.bq ...And if writeToWAL is set to false for the Put, data might be lost for 
the snapshot

If user sets 'do not write WAL' on their Puts, then user should not be 
surprised if their data does not show in the snapshot going route b.) in 2 
above (IMO option a.) in 2 above will make it so you can't come near to the 
first of Todd's bullet points above).

Above you suggest that manifests can be edited subsequent to their writing.  
I'd suggest that once written, they are never changed.  Regards how to find a 
file that has been 'moved'/'renamed', i'd suggest we run with a pattern.  Files 
that hbase is done with get moved to a shadow directory structure of deleted 
stuff or else the files are renamed with a 'deleted' suffix.  Getting a file, 
if we fail to find the file with the path in the manifest, we'll add the 
'.deleted' and try again.

Yes, split does not mutate files.

I think that the manifest needs to be made up of a file per running 
regionserver.  If regions are offline at the time of the snapshot, that 
shouldn't be too hard to figure and accommodate.

> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Priority: Minor
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to