smengcl opened a new pull request, #4701: URL: https://github.com/apache/ozone/pull/4701
## What changes were proposed in this pull request? During snapshot creation, access to `deletedTable` and `deletedDirectoryTable` would need to be synchronized with `KeyDeletingTask` and `DirDeletingTask` to avoid out-of-order access (read/write) messing up either table. Here are the code logics that justify the locks: ### `createOmSnapshotCheckpoint` flow, called from `OMSnapshotCreateResponse#addToDBBatch` 1. Acquire `getTableLock(deletedDirectoryTable)` write lock, then acquire`getTableLock(deletedTable)` write lock 2. In `deletedTable`, remove all keys with prefix matching snapshot scope path (bucket) 3. In `deletedDirectoryTable`, remove all keys with prefix matching snapshot scope path (bucket) 4. Release `getTableLock(deletedTable)` write lock, then release`getTableLock(deletedDirectoryTable)` write lock ### `KeyDeletingTask#call` flow 1. Acquire `getTableLock(deletedTable)` write lock 2. `getPendingDeletionKeys()`: (currently) retrieves a number of keys from active DB's `deletedTable` 3. `processKeyDeletes()`: delete key blocks with SCM client `deleteKeyBlocks()`, submits `PurgeKeysRequest` Ratis request which then removes successfully reclaimed keys from active `deletedTable` 4. Release `getTableLock(deletedTable)` write lock ### `DirDeletingTask#call` flow 1. Acquire `getTableLock(deletedDirectoryTable)` write lock 2. Iterate over active `deletedDirectoryTable`, prepare a list of `PurgePathRequest`s, each contains immediate children (keys and dirs) under this directory. 3. Acquire `getTableLock(deletedTable)` write lock 4. `optimizeDirDeletesAndSubmitRequest()`: recurse further into sub-dirs if batch limit `pathLimitPerTask` isn't reached. Q: Can we refactor the same dir expansion logic? [One](https://github.com/apache/ozone/blob/dd003040a41def491e8de003ef8539ce40854972/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/AbstractKeyDeletingService.java#L356-L380), [Two](https://github.com/apache/ozone/blob/fb15c0514252518dcd445936813d1f7ab21b8bc9/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java#L136-L158), [Three](https://github.com/apache/ozone/blob/4578a063533bc1396a218a69613a842ff0b32ec6/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java#L352-L375) @aswinshakil 5. Submit `PurgePathRequest`s to Ratis 6. Release `getTableLock(deletedTable)` write lock 7. Release `getTableLock(deletedDirectoryTable)` write lock ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-8067 ## How was this patch tested? - All existing tests should pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
