[ 
https://issues.apache.org/jira/browse/HDDS-14856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066625#comment-18066625
 ] 

Ethan Rose commented on HDDS-14856:
-----------------------------------

This is Cursor's analysis of OM code that might still have issues after 
HDDS-14800, since that change was primarily targeting Datanodes:
 
h4. A. Unprotected WAL Iterators (RocksDatabase#getUpdatesSince)

While newIterator() was secured, RocksDatabase#getUpdatesSince() was missed. It 
still exhibits the original drop-the-lock anti-pattern:
 
{code:java}
// RocksDatabase.java public ManagedTransactionLogIterator getUpdatesSince(long 
sequenceNumber) throws RocksDatabaseException {   try (UncheckedAutoCloseable 
ignored = acquire()) {     return 
managed(db.get().getUpdatesSince(sequenceNumber));   } // <--- Lock is 
immediately released here!   // ... }{code}
 
If a background thread is reading WAL updates and the DB is closed 
concurrently, the JVM will crash.Recommendation: Mirror the changes you made to 
ManagedRocksIterator. Update ManagedTransactionLogIterator to accept and hold 
an UncheckedAutoCloseable dbRef, and update RocksDatabase#getUpdatesSince() to 
pass the acquire() reference into it.
h4. B. Ozone Manager Bypassing RocksDatabase

The Datanode logic is now well-protected because it leverages the RocksDatabase 
wrapper. However, several critical classes in Ozone Manager directly invoke 
newIterator() on raw ManagedRocksDB objects, completely bypassing the reference 
counting layer.If the underlying RocksDB is closed dynamically (e.g., an 
OmSnapshot instance being evicted/closed, or during OM reconfiguration) while a 
background operation is scanning, the JVM will still suffer a native crash. 
Notable examples include:
 * RocksDbPersistentMap, RocksDbPersistentList, RocksDbPersistentSet: Used 
heavily in the OM Snapshot layer.

 * SnapshotDiffCleanupService: The background cleanup thread iterates over 
snapDiffJobCfh entirely unprotected.

> Guard ManagedRocksDB direct iterators against concurrent DB close (follow-up 
> to HDDS-14800)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDDS-14856
>                 URL: https://issues.apache.org/jira/browse/HDDS-14856
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Priyesh K
>            Priority: Major
>
> HDDS-14800 fixed a TOCTOU race condition for the main volume iterator path by 
> having {{ManagedRocksIterator}} hold the {{RocksDatabase}} reference counter 
> for the full lifetime of the iterator. This prevents the DB from being 
> physically destroyed while the iterator is in use.
> However, the fix only applies to iterators created through 
> {{{}RocksDatabase.newIterator(){}}}. Several components bypass this layer and 
> call {{ManagedRocksDB.get().newIterator()}} directly, so the counter is never 
> acquired and the same race window exists for those iterators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to