[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

Enis Soztutar (JIRA) Wed, 06 Nov 2013 15:57:08 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815463#comment-13815463
 ]


Enis Soztutar commented on HBASE-9906:
--------------------------------------

Out of the above options, (1) will take some time to fix. (3) has another 
problem because we would be intermixing client-supplied timestamps and server 
supplied tss, which might cause further problems in meta, if clocks are out of 
sync. (4) is not ideal as well, since we want to delete the whole row, except 
for column info:regioninfo. For this we have to do a get for obtaining the 
columns for each row, and send deletes for each row. So that leaves us with 
option (2), which is embarrassing, but given that restore is very infrequent, 
that we can justify sleeping extra 20ms.  

> Restore snapshot fails to restore the meta edits sporadically  
> ---------------------------------------------------------------
>
>                 Key: HBASE-9906
>                 URL: https://issues.apache.org/jira/browse/HBASE-9906
>             Project: HBase
>          Issue Type: New Feature
>          Components: snapshots
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.96.1, 0.94.14
>
>
> After snaphot restore, we see failures to find the table in meta:
> {code}
> > disable 'tablefour'
> > restore_snapshot 'snapshot_tablefour'
> > enable 'tablefour'
> ERROR: Table tablefour does not exist.'
> {code}
> This is quite subtle. From the looks of it, we successfully restore the 
> snapshot, do the meta updates, return to the client about the status. The 
> client then tries to do an operation for the table (like enable table, or 
> scan in the test outputs) which fails because the meta entry for the region 
> seems to be gone (in case of single region, the table will be reported 
> missing). Subsequent attempts for creating the table will also fail because 
> the table directories will be there, but not the meta entries.
> For restoring meta entries, we are doing a delete then a put to the same 
> region:
> {code}
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to restore: 
> 76d0e2b7ec3291afcaa82e18a56ccc30
> 2013-11-04 10:39:51,582 INFO 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper: region to remove: 
> fa41edf43fe3ee131db4a34b848ff432
> ...
> 2013-11-04 10:39:52,102 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Deleted [{ENCODED => fa41edf43fe3ee131db4a34b848ff432, NAME => 
> 'tablethree_mod,,1383559723345.fa41edf43fe3ee131db4a34b848ff432.', STARTKEY 
> => '', ENDKEY => ''}, {ENCODED => 76d0e2b7ec3291afcaa82e18a56ccc30, NAME => 
> 'tablethree_mod,,1383561123097.76d0e2b7ec3291afcaa82e18a56ccc30.', STARTKE
> 2013-11-04 10:39:52,111 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Added 1
> {code}
> The root cause for this sporadic failure is that, the delete and subsequent 
> put will have the same timestamp if they execute in the same ms. The delete 
> will override the put in the same ts, even though the put have a larger ts.
> See: HBASE-9905, HBASE-8770
> Credit goes to [~huned] for reporting this bug. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9906) Restore snapshot fails to restore the meta edits sporadically

Reply via email to