[
https://issues.apache.org/jira/browse/HBASE-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508857#comment-13508857
]
Matteo Bertozzi commented on HBASE-7245:
----------------------------------------
The set of operations that have this kind of problem are:
* create table: remove the table if failed (rollback) the user already
received the failure
* delete table: finish removing the table (rollforward) restoring the table is
impossible
* clone table: remove the table if failed (rollback) same as create table
* restore table: finish restoring the table (rollforward) finish the restore
* snapshot: removing the tmp folder (rollback)
One simple solution is to drop a "operation lock" file in the table folder, and
on master startup, if the file is present look at the operation enum serialized
and execute the "rollback/rollforward". (Note that if the master is not down,
you can do the recovery catching the exception)
> Recovery on failed restore.
> ---------------------------
>
> Key: HBASE-7245
> URL: https://issues.apache.org/jira/browse/HBASE-7245
> Project: HBase
> Issue Type: Sub-task
> Components: Client, master, regionserver, snapshots, Zookeeper
> Reporter: Jonathan Hsieh
> Assignee: Matteo Bertozzi
> Fix For: hbase-6055, 0.96.0
>
>
> Restore will do updates to the file system and to meta. it seems that an
> inopportune failure before meta is completely updated could result in an
> inconsistent state that would require hbck to fix.
> We should define what the semantics are for recovering from this. Some
> suggestions:
> 1) Fail Forward (see some log saying restore's meta edits not completed, then
> gather information necessary to build it all from fs, and complete meta
> edits.).
> 2) Fail backwards (see some log saying restore's meta edits not completed,
> delete incomplete snapshot region entries from meta.)
> I think I prefer 1 -- if two processes end somehow updating (somehow the
> original master didn't die, and a new one started up) they would be
> idempotent. If we used 2, we could still have a race and still be in a bad
> place.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira