[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852941#action_12852941
]
stack commented on HBASE-50:
----------------------------
Just to add that Todd had a really nice idea yesterday morning where another
use for snapshot would be so you could do mapreduce against the raw hbase table
hfiles rather than go via the hbase API (when we write storefiles to the
manifest, we should preserve their order so newest is first and so on). Before
starting the MR job, you'd trigger a snapshot and give the snapshot number as
the MR job input. We'd then write a snapshot input format that could read all
the manifests and feed hfile content as KeyValue inputs.
> Snapshot of table
> -----------------
>
> Key: HBASE-50
> URL: https://issues.apache.org/jira/browse/HBASE-50
> Project: Hadoop HBase
> Issue Type: New Feature
> Reporter: Billy Pearson
> Priority: Minor
>
> Havening an option to take a snapshot of a table would be vary useful in
> production.
> What I would like to see this option do is do a merge of all the data into
> one or more files stored in the same folder on the dfs. This way we could
> save data in case of a software bug in hadoop or user code.
> The other advantage would be to be able to export a table to multi locations.
> Say I had a read_only table that must be online. I could take a snapshot of
> it when needed and export it to a separate data center and have it loaded
> there and then i would have it online at multi data centers for load
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect
> from failed servers, but this does not protect use from software bugs that
> might delete or alter data in ways we did not plan. We should have a way we
> can roll back a dataset.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.