[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Chongxin updated HBASE-50:
-----------------------------

    Attachment: HBase Snapshot Design Report V3.pdf

Design document has been updated based on the discussion. Following changes 
have been made:

* Requirements have been updated

* Snapshot can now be created for both online (enabled) tables and offline 
(disabled) tables. For offline table, snapshot is performed by the master

* Metadata for the table is not copied from .regioninfo any more but totally 
dumped from .META.

* WAL logs are now archived instead of deleted, so snapshot does not copy the 
log files any more but take a file that lists the log names. A new section 6.5 
is added on log maintenance

* Rename 'reference' family in .META. to 'snapshot'

* Add the same column family 'snapshot' to -ROOT- so that .META. can be 
snapshot too

* A new file .snapshotinfo is created under each snapshot dir to keep the meta 
information of snapshot. List operation for snapshots will read the this meta 
file.

* A new operation 'Restore' is added to restore a table from a snapshot on the 
same data center

* Export and import are changed. Export and import are used to export a 
snapshot to or imort a snapshot from other data centers. Therefore, exported 
snapshot has the same file format as how a table is exported so that we can 
treat exported snapshot the same as exported table and import the exported 
snapshot with the same import facility.

Pending Questions:
1.
What if the table with the same name is still online when we want to restore a 
snapshot? There will be a name collision in both HDFS and .META. ; We should 
not touch the existing table, right?
2.
Then shall we allow rename the snapshot as a new table name? For example the 
snapshot is created for table "table1", can we restore the snapshot as "table2"?

> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
> Design Report V3.pdf, snapshot-src.zip
>
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to