[
https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195033#comment-15195033
]
Jianwei Cui commented on HBASE-15433:
-------------------------------------
{quote}
Instead of getting table region count from quota cache we can get it from
RegionLocator which will solve the corner case you described.
{quote}
This may make other corner cases fail if I am not wrong. For example, the table
has 5 regions, clientA is trying to restore the table to snapshot with 8
regions, while clientB is trying to restore the snapshot with 10 regions, then:
1. clientA firstly invokes {{checkAndUpdateNamespaceRegionQuota}} before
{{restoreSnapshot}}, the {{tableRegionCount}} is 5 for clientA and it updates
the region count of the table to 8
2. Before clientA invokes {{restoreSnapshot}}, clientB invokes
{{checkAndUpdateNamespaceRegionQuota}} before {{restoreSnapshot}}, the
{{tableRegionCount}} is also 5(when using RegionLocator) for clientB and it
updates the region count of the table to 10
3. clientA successfully restored its snapshot, so that the actual region count
is 8
4. clientB encountered IOE in {{restoreSnapshot}} and will reset the region
count to 5 in IOE catch clause. However, the region count should be 8 because
clientA succeeded.
I think it is not easy to resolve the concurrent issues in {{SnapshotManager}}
without lock, we may wait for RestoreSnapshotHandler rewritten by procedure v2
and move quota updating in RestoreSnapshotHandler?
> SnapshotManager#restoreSnapshot not update table and region count quota
> correctly when encountering exception
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-15433
> URL: https://issues.apache.org/jira/browse/HBASE-15433
> Project: HBase
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 2.0.0
> Reporter: Jianwei Cui
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.4.0, 1.1.5
>
> Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch,
> HBASE-15433-trunk.patch, HBASE-15433-v3.patch
>
>
> In SnapshotManager#restoreSnapshot, the table and region quota will be
> checked and updated as:
> {code}
> try {
> // Table already exist. Check and update the region quota for this
> table namespace
> checkAndUpdateNamespaceRegionQuota(manifest, tableName);
> restoreSnapshot(snapshot, snapshotTableDesc);
> } catch (IOException e) {
>
> this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName);
> LOG.error("Exception occurred while restoring the snapshot " +
> snapshot.getName()
> + " as table " + tableName.getNameAsString(), e);
> throw e;
> }
> {code}
> The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot
> make the region count quota exceeded, then, the table will be removed in the
> 'catch' block. This will make the current table count and region count
> decrease, following table creation or region split will succeed even if the
> actual quota is exceeded.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)