[ https://issues.apache.org/jira/browse/HBASE-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509013#comment-13509013 ]
Ted Yu commented on HBASE-7245: ------------------------------- The discussion from HBASE-6721 is related in this regard. Francis started with storing group information on hdfs. Later he switched to storage in table. Whether storing in zookeeper is under review. I am fine with storing operation directive on hdfs. > Recovery on failed snapshot restore > ----------------------------------- > > Key: HBASE-7245 > URL: https://issues.apache.org/jira/browse/HBASE-7245 > Project: HBase > Issue Type: Sub-task > Components: Client, master, regionserver, snapshots, Zookeeper > Reporter: Jonathan Hsieh > Assignee: Matteo Bertozzi > Fix For: hbase-6055, 0.96.0 > > > Restore will do updates to the file system and to meta. it seems that an > inopportune failure before meta is completely updated could result in an > inconsistent state that would require hbck to fix. > We should define what the semantics are for recovering from this. Some > suggestions: > 1) Fail Forward (see some log saying restore's meta edits not completed, then > gather information necessary to build it all from fs, and complete meta > edits.). > 2) Fail backwards (see some log saying restore's meta edits not completed, > delete incomplete snapshot region entries from meta.) > I think I prefer 1 -- if two processes end somehow updating (somehow the > original master didn't die, and a new one started up) they would be > idempotent. If we used 2, we could still have a race and still be in a bad > place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira