[
https://issues.apache.org/jira/browse/HDDS-14856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066625#comment-18066625
]
Ethan Rose commented on HDDS-14856:
-----------------------------------
This is Cursor's analysis of OM code that might still have issues after
HDDS-14800, since that change was primarily targeting Datanodes:
h4. A. Unprotected WAL Iterators (RocksDatabase#getUpdatesSince)
While newIterator() was secured, RocksDatabase#getUpdatesSince() was missed. It
still exhibits the original drop-the-lock anti-pattern:
{code:java}
// RocksDatabase.java public ManagedTransactionLogIterator getUpdatesSince(long
sequenceNumber) throws RocksDatabaseException { try (UncheckedAutoCloseable
ignored = acquire()) { return
managed(db.get().getUpdatesSince(sequenceNumber)); } // <--- Lock is
immediately released here! // ... }{code}
If a background thread is reading WAL updates and the DB is closed
concurrently, the JVM will crash.Recommendation: Mirror the changes you made to
ManagedRocksIterator. Update ManagedTransactionLogIterator to accept and hold
an UncheckedAutoCloseable dbRef, and update RocksDatabase#getUpdatesSince() to
pass the acquire() reference into it.
h4. B. Ozone Manager Bypassing RocksDatabase
The Datanode logic is now well-protected because it leverages the RocksDatabase
wrapper. However, several critical classes in Ozone Manager directly invoke
newIterator() on raw ManagedRocksDB objects, completely bypassing the reference
counting layer.If the underlying RocksDB is closed dynamically (e.g., an
OmSnapshot instance being evicted/closed, or during OM reconfiguration) while a
background operation is scanning, the JVM will still suffer a native crash.
Notable examples include:
* RocksDbPersistentMap, RocksDbPersistentList, RocksDbPersistentSet: Used
heavily in the OM Snapshot layer.
* SnapshotDiffCleanupService: The background cleanup thread iterates over
snapDiffJobCfh entirely unprotected.
> Guard ManagedRocksDB direct iterators against concurrent DB close (follow-up
> to HDDS-14800)
> -------------------------------------------------------------------------------------------
>
> Key: HDDS-14856
> URL: https://issues.apache.org/jira/browse/HDDS-14856
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Priyesh K
> Priority: Major
>
> HDDS-14800 fixed a TOCTOU race condition for the main volume iterator path by
> having {{ManagedRocksIterator}} hold the {{RocksDatabase}} reference counter
> for the full lifetime of the iterator. This prevents the DB from being
> physically destroyed while the iterator is in use.
> However, the fix only applies to iterators created through
> {{{}RocksDatabase.newIterator(){}}}. Several components bypass this layer and
> call {{ManagedRocksDB.get().newIterator()}} directly, so the counter is never
> acquired and the same race window exists for those iterators.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]