[ 
https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195033#comment-15195033
 ] 

Jianwei Cui commented on HBASE-15433:
-------------------------------------

{quote}
Instead of getting table region count from quota cache we can get it from 
RegionLocator which will solve the corner case you described.
{quote}
This may make other corner cases fail if I am not wrong. For example, the table 
has 5 regions, clientA is trying to restore the table to snapshot with 8 
regions, while clientB is trying to restore the snapshot with 10 regions, then:
1. clientA firstly invokes {{checkAndUpdateNamespaceRegionQuota}} before 
{{restoreSnapshot}}, the {{tableRegionCount}} is 5 for clientA and it updates 
the region count of the table to 8
2. Before clientA invokes {{restoreSnapshot}}, clientB invokes 
{{checkAndUpdateNamespaceRegionQuota}} before {{restoreSnapshot}}, the 
{{tableRegionCount}} is also 5(when using RegionLocator) for clientB and it 
updates the region count of the table to 10
3. clientA successfully restored its snapshot, so that the actual region count 
is 8
4. clientB encountered IOE in {{restoreSnapshot}} and will reset the region 
count to 5 in IOE catch clause. However, the region count should be 8 because 
clientA succeeded.
I think it is not easy to resolve the concurrent issues in {{SnapshotManager}} 
without lock, we may wait for RestoreSnapshotHandler rewritten by procedure v2 
and move quota updating in RestoreSnapshotHandler?

> SnapshotManager#restoreSnapshot not update table and region count quota 
> correctly when encountering exception
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15433
>                 URL: https://issues.apache.org/jira/browse/HBASE-15433
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
>             Fix For: 2.0.0, 1.3.0, 1.2.1, 1.4.0, 1.1.5
>
>         Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, 
> HBASE-15433-trunk.patch, HBASE-15433-v3.patch
>
>
> In SnapshotManager#restoreSnapshot, the table and region quota will be 
> checked and updated as:
> {code}
>       try {
>         // Table already exist. Check and update the region quota for this 
> table namespace
>         checkAndUpdateNamespaceRegionQuota(manifest, tableName);
>         restoreSnapshot(snapshot, snapshotTableDesc);
>       } catch (IOException e) {
>         
> this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName);
>         LOG.error("Exception occurred while restoring the snapshot " + 
> snapshot.getName()
>             + " as table " + tableName.getNameAsString(), e);
>         throw e;
>       }
> {code}
> The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot 
> make the region count quota exceeded, then, the table will be removed in the 
> 'catch' block. This will make the current table count and region count 
> decrease, following table creation or region split will succeed even if the 
> actual quota is exceeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to