[ 
https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755450#comment-13755450
 ] 

Matteo Bertozzi commented on HBASE-9397:
----------------------------------------

looks good, with a fix to
{code}
-    if (isTakingSnapshot(snapshotTable)) {
-      SnapshotSentinel handler = this.snapshotHandlers.get(snapshotTable);
+    if (isTakingSnapshot(snapshot)) {
+      SnapshotSentinel handler = restoreHandlers.get(snapshotTable);
{code}
instead of restoreHandlers.get() should be snapshotHandlers.get() I think you 
copy pasted from the code below that is checking if restore is in progress.


{code}
+  synchronized boolean isTakingSnapshot(final SnapshotDescription snapshot) {
...
+    while (it.hasNext()) {
...
+      if (snapshot.getName().equals(sentinel.getSnapshot().getName()) && 
!sentinel.isFinished())
...
+    return isTakingSnapshot(snapshotTable);
{code}
nit(I'm good even without this change): I think the table should be checked 
first, I assume is more likely to have a job taking a snapshot on the same 
table (e.g. snapshot-table-000N) than having two jobs taking a snapshot with 
the same name (e.g. snapshot-000N)
                
> Snapshots with the same name are allowed to proceed concurrently
> ----------------------------------------------------------------
>
>                 Key: HBASE-9397
>                 URL: https://issues.apache.org/jira/browse/HBASE-9397
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.95.2, 0.94.11
>            Reporter: Jerry He
>            Assignee: Jerry He
>             Fix For: 0.94.12, 0.96.0
>
>         Attachments: HBASE-9397-0.94.patch, HBASE-9397-trunk.patch
>
>
> Snapshots with the same name (but on different tables) are allowed to proceed 
> concurrently.
> This seems to be loop hole created by allowing multiple snapshots (on 
> different tables) to run concurrently.
> There are two checks in SnapshotManager, but fail to catch this particular 
> case.
> In isSnapshotCompleted(), we only check the completed snapshot directory.
> In isTakingSnapshot(), we only check for the same table name.
> The end result is the concurrently running snapshots with the same name are 
> overlapping and messing up each other. For example, cleaning up the other's 
> snapshot working directory in .hbase-snapshot/.tmp/snapshot-name.
> {code}
> 2013-08-29 18:25:13,443 ERROR 
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking 
> snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to 
> exception:Couldn't read snapshot info 
> from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
> snapshot info 
> from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo
>         at 
> org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321)
>         at 
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to