[ 
https://issues.apache.org/jira/browse/HDDS-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswin Shakil updated HDDS-10076:
--------------------------------
    Description: 
While accessing snapshot, We use SnapshotCache to load and retrieve a 
snapshot's RocksDB instance. When Multiple background process, accesses the 
SnapshotCache's *get,* We also cleanup the pending eviction list. 

There is a scenario, where Thread 1(KeyDeletingService) is executing 
*get->cleanup* methods ** and Thread 2(SSTFilteringService) is executing *get,* 
The reference count of the snapshot is incremented by *get* we still close the 
rocksDB instance because the *cleanup* method assumes everything in the pending 
eviction list has a reference count of 0. Which is not the case, We need to 
recheck this when closing the RocksDB instance. Other wise we end up in this 
scenario,

 
{code:java}
2023-12-18 19:19:28,739 INFO 
[SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: 
Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
2023-12-18 19:19:28,741 ERROR 
[SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: 
Exception encountered while filtering a snapshot
java.io.IOException: Rocks Database is closed
2023-12-18 19:20:28,739 INFO 
[SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: 
Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
2023-12-18 19:20:28,768 WARN 
[KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService: 
Background task execution failed
java.lang.IllegalStateException: Cache map entry removal failure. The cache is 
in an inconsistent state. Expected OmSnapshot instance: 
org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: 
org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@7656056
2023-12-18 19:20:28,768 WARN 
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.BackgroundService: 
Background task execution failed
java.lang.IllegalStateException: Cache map entry removal failure. The cache is 
in an inconsistent state. Expected OmSnapshot instance: 
org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: null
2023-12-18 19:21:06,486 WARN [Finalizer]-org.apache.hadoop.ozone.om.OmSnapshot: 
org.apache.hadoop.hdds.utils.db.RDBStore@4e5ac786 is not closed properly. 
snapshotName: snap-5griw
2023-12-18 19:21:28,742 ERROR 
[SstFilteringService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: Failed to 
retrieve snapshot: /vol-t2gj8/buck-07uux/snap-5griw
java.io.IOException: Failed init RocksDB, db path : 
/var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b,
 exception :org.rocksdb.RocksDBException lock hold by current process, acquire 
time 1702927228 acquiring thread 139777345017600: 
/var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b/LOCK:
 No locks available
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
        at 
org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:598)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:406)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:357)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$0(SnapshotCache.java:171)
        at 
java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:167)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:153)
{code}


 

  was:While accessing snapshot, We use SnapshotCache to load and retrieve a 
snapshot's RocksDB instance. When Multiple background process 


> SnapshotCache closes RocksDB instance with Reference
> ----------------------------------------------------
>
>                 Key: HDDS-10076
>                 URL: https://issues.apache.org/jira/browse/HDDS-10076
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Aswin Shakil
>            Assignee: Aswin Shakil
>            Priority: Major
>
> While accessing snapshot, We use SnapshotCache to load and retrieve a 
> snapshot's RocksDB instance. When Multiple background process, accesses the 
> SnapshotCache's *get,* We also cleanup the pending eviction list. 
> There is a scenario, where Thread 1(KeyDeletingService) is executing 
> *get->cleanup* methods ** and Thread 2(SSTFilteringService) is executing 
> *get,* The reference count of the snapshot is incremented by *get* we still 
> close the rocksDB instance because the *cleanup* method assumes everything in 
> the pending eviction list has a reference count of 0. Which is not the case, 
> We need to recheck this when closing the RocksDB instance. Other wise we end 
> up in this scenario,
>  
> {code:java}
> 2023-12-18 19:19:28,739 INFO 
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: 
> Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
> 2023-12-18 19:19:28,741 ERROR 
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: 
> Exception encountered while filtering a snapshot
> java.io.IOException: Rocks Database is closed
> 2023-12-18 19:20:28,739 INFO 
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: 
> Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
> 2023-12-18 19:20:28,768 WARN 
> [KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService: 
> Background task execution failed
> java.lang.IllegalStateException: Cache map entry removal failure. The cache 
> is in an inconsistent state. Expected OmSnapshot instance: 
> org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: 
> org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@7656056
> 2023-12-18 19:20:28,768 WARN 
> [SstFilteringService#0]-org.apache.hadoop.hdds.utils.BackgroundService: 
> Background task execution failed
> java.lang.IllegalStateException: Cache map entry removal failure. The cache 
> is in an inconsistent state. Expected OmSnapshot instance: 
> org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: null
> 2023-12-18 19:21:06,486 WARN 
> [Finalizer]-org.apache.hadoop.ozone.om.OmSnapshot: 
> org.apache.hadoop.hdds.utils.db.RDBStore@4e5ac786 is not closed properly. 
> snapshotName: snap-5griw
> 2023-12-18 19:21:28,742 ERROR 
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: Failed 
> to retrieve snapshot: /vol-t2gj8/buck-07uux/snap-5griw
> java.io.IOException: Failed init RocksDB, db path : 
> /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b,
>  exception :org.rocksdb.RocksDBException lock hold by current process, 
> acquire time 1702927228 acquiring thread 139777345017600: 
> /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b/LOCK:
>  No locks available
>         at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
>         at 
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
>         at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:598)
>         at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:406)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:357)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$0(SnapshotCache.java:171)
>         at 
> java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:167)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:153)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to