[ https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554960 ]
stack commented on HADOOP-2496: ------------------------------- Here's some ideas for how this might work Billy. HADOOP-1958 talks of making a table read-only. It also talks of being able to send a flush-and-compact command across a cluster/table so all in-memory entries are persisted followed by a compaction to tidy-up the on-disk representation. Jim is currently working on HADOOP-2478 which will move all to do with a particular table under a directory named for the table in hdfs. Hadoop has a copy files utility that can take a src in one fileystem and a target in the same or another filesystem and will run a mapreduce command to do a fast copy. Deploying the backup copy would run pretty much as you suggest only I'd imagine we'd have a tool that read the backed up table directory and per-region-found, did an insert into the catalog .META. table (Same tool run with a different option would purge a table from the catalog). > Snapshot of table > ----------------- > > Key: HADOOP-2496 > URL: https://issues.apache.org/jira/browse/HADOOP-2496 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: Billy Pearson > Fix For: 0.16.0 > > > Havening an option to take a snapshot of a table would be vary useful in > production. > What I would like to see this option do is do a merge of all the data into > one or more files stored in the same folder on the dfs. This way we could > save data in case of a software bug in hadoop or user code. > The other advantage would be to be able to export a table to multi locations. > Say I had a read_only table that must be online. I could take a snapshot of > it when needed and export it to a separate data center and have it loaded > there and then i would have it online at multi data centers for load > balancing and failover. > I understand that hadoop takes the need out of havening backup to protect > from failed servers, but this does not protect use from software bugs that > might delete or alter data in ways we did not plan. We should have a way we > can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.