swamirishi commented on code in PR #6456:
URL: https://github.com/apache/ozone/pull/6456#discussion_r1548656889
##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -80,33 +87,62 @@ public OMClientResponse validateAndUpdateCache(OzoneManager
ozoneManager, TermIn
Map<String, SnapshotInfo> updatedPathPreviousAndGlobalSnapshots =
new HashMap<>();
- // Snapshots that are purged by the SnapshotDeletingService
- // will update the next snapshot so that is can be deep cleaned
- // by the KeyDeletingService in the next run.
+ // Each snapshot purge operation does three things:
+ // 1. Update the snapshot chain,
+ // 2. Update the deep clean flag for the next active snapshot (So that
it can be
+ // deep cleaned by the KeyDeletingService in the next run),
+ // 3. Finally, purge the snapshot.
+ // All of these steps have to be performed only when it acquires all the
necessary
+ // locks (lock on the snapshot to be purged, lock on the next active
snapshot, and
+ // lock on the next path and global previous snapshots). Ideally, there
is no need
+ // for locks for snapshot purge and can rely on OMStateMachine because
OMStateMachine
+ // is going to process each request sequentially.
+ //
+ // But there is a problem with that. After filtering unnecessary SST
files for a snapshot,
+ // SstFilteringService updates that snapshot's SstFilter flag.
SstFilteringService cannot
+ // use SetSnapshotProperty API because it runs on each OM independently
and One OM does
+ // not know if the snapshot has been filtered on the other OM in HA
environment.
+ //
+ // If locks are not taken snapshot purge and SstFilteringService will
cause a race condition
+ // and override one's update with another.
for (String snapTableKey : snapshotDbKeys) {
- SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable()
- .get(snapTableKey);
-
- if (fromSnapshot == null) {
- // Snapshot may have been purged in the previous iteration of
SnapshotDeletingService.
- LOG.warn("The snapshot {} is not longer in snapshot table, It maybe
removed in the previous " +
- "Snapshot purge request.", snapTableKey);
- continue;
- }
+ // To acquire all the locks, a set is maintained which is keyed by
snapshotTableKey.
+ // snapshotTableKey is nothing but /volumeName/bucketName/snapshotName.
+ // Once all the locks are acquired, it performs the three steps
mentioned above and
+ // release all the locks after that.
+ Set<String> lockSet = new HashSet<>();
Review Comment:
Since we know there are going to be only 2 elements in the set. Should we
set the initial capacity to 2 and load factor as 1?
```suggestion
Set<String> lockSet = new HashSet<>(2, 1.0);
```
##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -120,20 +156,39 @@ public OMClientResponse
validateAndUpdateCache(OzoneManager ozoneManager, TermIn
return omClientResponse;
}
+ private void acquireLock(Set<String> lockSet, String snapshotTableKey,
+ OMMetadataManager omMetadataManager) throws
IOException {
+ if (!lockSet.contains(snapshotTableKey)) {
+ Triple<String, String, String> triple =
SnapshotUtils.getSnapshotTableKeyFromString(snapshotTableKey);
Review Comment:
Why write another function? We can get this info from snapshotInfo ?
OmMetadataManager should anyways have all snapshot info in memory.
##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -80,33 +87,62 @@ public OMClientResponse validateAndUpdateCache(OzoneManager
ozoneManager, TermIn
Map<String, SnapshotInfo> updatedPathPreviousAndGlobalSnapshots =
new HashMap<>();
- // Snapshots that are purged by the SnapshotDeletingService
- // will update the next snapshot so that is can be deep cleaned
- // by the KeyDeletingService in the next run.
+ // Each snapshot purge operation does three things:
+ // 1. Update the snapshot chain,
+ // 2. Update the deep clean flag for the next active snapshot (So that
it can be
+ // deep cleaned by the KeyDeletingService in the next run),
+ // 3. Finally, purge the snapshot.
+ // All of these steps have to be performed only when it acquires all the
necessary
+ // locks (lock on the snapshot to be purged, lock on the next active
snapshot, and
+ // lock on the next path and global previous snapshots). Ideally, there
is no need
+ // for locks for snapshot purge and can rely on OMStateMachine because
OMStateMachine
+ // is going to process each request sequentially.
+ //
+ // But there is a problem with that. After filtering unnecessary SST
files for a snapshot,
+ // SstFilteringService updates that snapshot's SstFilter flag.
SstFilteringService cannot
+ // use SetSnapshotProperty API because it runs on each OM independently
and One OM does
+ // not know if the snapshot has been filtered on the other OM in HA
environment.
+ //
+ // If locks are not taken snapshot purge and SstFilteringService will
cause a race condition
+ // and override one's update with another.
for (String snapTableKey : snapshotDbKeys) {
- SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable()
- .get(snapTableKey);
-
- if (fromSnapshot == null) {
- // Snapshot may have been purged in the previous iteration of
SnapshotDeletingService.
- LOG.warn("The snapshot {} is not longer in snapshot table, It maybe
removed in the previous " +
- "Snapshot purge request.", snapTableKey);
- continue;
- }
+ // To acquire all the locks, a set is maintained which is keyed by
snapshotTableKey.
+ // snapshotTableKey is nothing but /volumeName/bucketName/snapshotName.
+ // Once all the locks are acquired, it performs the three steps
mentioned above and
+ // release all the locks after that.
+ Set<String> lockSet = new HashSet<>();
+ try {
+ acquireLock(lockSet, snapTableKey, omMetadataManager);
Review Comment:
It would be better we take lock in the sorted order of snapTableKey
##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -171,58 +226,63 @@ private void updateSnapshotChainAndCache(
return;
}
- // Updates next path snapshot's previous snapshot ID
+ String nextPathSnapshotKey = null;
+
if (hasNextPathSnapshot) {
UUID nextPathSnapshotId = snapshotChainManager.nextPathSnapshot(
snapInfo.getSnapshotPath(), snapInfo.getSnapshotId());
-
- String snapshotTableKey = snapshotChainManager
+ nextPathSnapshotKey = snapshotChainManager
.getTableKey(nextPathSnapshotId);
- nextPathSnapInfo = metadataManager.getSnapshotInfoTable()
- .get(snapshotTableKey);
- if (nextPathSnapInfo != null) {
- nextPathSnapInfo.setPathPreviousSnapshotId(
- snapInfo.getPathPreviousSnapshotId());
- metadataManager.getSnapshotInfoTable().addCacheEntry(
- new CacheKey<>(nextPathSnapInfo.getTableKey()),
- CacheValue.get(trxnLogIndex, nextPathSnapInfo));
- updatedPathPreviousAndGlobalSnapshots
- .put(nextPathSnapInfo.getTableKey(), nextPathSnapInfo);
- }
+
+ // Acquire lock from the snapshot
+ acquireLock(lockSet, nextPathSnapshotKey, metadataManager);
}
- // Updates next global snapshot's previous snapshot ID
+ String nestGlobalSnapshotKey = null;
if (hasNextGlobalSnapshot) {
- UUID nextGlobalSnapshotId =
- snapshotChainManager.nextGlobalSnapshot(snapInfo.getSnapshotId());
-
- String snapshotTableKey = snapshotChainManager
- .getTableKey(nextGlobalSnapshotId);
-
- SnapshotInfo nextGlobalSnapInfo = metadataManager.getSnapshotInfoTable()
- .get(snapshotTableKey);
- // If both next global and path snapshot are same, it may overwrite
- // nextPathSnapInfo.setPathPreviousSnapshotID(), adding this check
- // will prevent it.
- if (nextGlobalSnapInfo != null && nextPathSnapInfo != null &&
- nextGlobalSnapInfo.getSnapshotId().equals(
- nextPathSnapInfo.getSnapshotId())) {
- nextPathSnapInfo.setGlobalPreviousSnapshotId(
- snapInfo.getGlobalPreviousSnapshotId());
- metadataManager.getSnapshotInfoTable().addCacheEntry(
- new CacheKey<>(nextPathSnapInfo.getTableKey()),
- CacheValue.get(trxnLogIndex, nextPathSnapInfo));
- updatedPathPreviousAndGlobalSnapshots
- .put(nextPathSnapInfo.getTableKey(), nextPathSnapInfo);
- } else if (nextGlobalSnapInfo != null) {
- nextGlobalSnapInfo.setGlobalPreviousSnapshotId(
- snapInfo.getGlobalPreviousSnapshotId());
- metadataManager.getSnapshotInfoTable().addCacheEntry(
- new CacheKey<>(nextGlobalSnapInfo.getTableKey()),
- CacheValue.get(trxnLogIndex, nextGlobalSnapInfo));
- updatedPathPreviousAndGlobalSnapshots
- .put(nextGlobalSnapInfo.getTableKey(), nextGlobalSnapInfo);
- }
+ UUID nextGlobalSnapshotId =
snapshotChainManager.nextGlobalSnapshot(snapInfo.getSnapshotId());
+ nestGlobalSnapshotKey =
snapshotChainManager.getTableKey(nextGlobalSnapshotId);
Review Comment:
nextGlobalSnapshotKey*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]