[jira] Commented: (HADOOP-2496) Snapshot of table

stack (JIRA) Sat, 29 Dec 2007 14:17:04 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554960
 ]


stack commented on HADOOP-2496:
-------------------------------

Here's some ideas for how this might work Billy.

HADOOP-1958 talks of making a table read-only.  It also talks of being able to 
send a flush-and-compact command across a cluster/table so all in-memory 
entries are persisted followed by a compaction to tidy-up the on-disk 
representation.  Jim is currently working on HADOOP-2478 which will move all to 
do with a particular table under a directory named for the table in hdfs.  
Hadoop has a copy files utility that can take a src in one fileystem and a 
target in the same or another filesystem and will run a mapreduce command to do 
a fast copy.

Deploying the backup copy would run pretty much as you suggest only I'd imagine 
we'd have a tool that read the backed up table directory and per-region-found, 
did an insert into the catalog .META. table (Same tool run with a different 
option would purge a table from the catalog). 

> Snapshot of table
> -----------------
>
>                 Key: HADOOP-2496
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2496
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>             Fix For: 0.16.0
>
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2496) Snapshot of table

Reply via email to