[
https://issues.apache.org/jira/browse/HDDS-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aswin Shakil updated HDDS-10076:
--------------------------------
Resolution: Duplicate
Status: Resolved (was: Patch Available)
> SnapshotCache closes RocksDB instance with Reference
> ----------------------------------------------------
>
> Key: HDDS-10076
> URL: https://issues.apache.org/jira/browse/HDDS-10076
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Aswin Shakil
> Assignee: Aswin Shakil
> Priority: Major
> Labels: pull-request-available
>
> While accessing snapshot, We use SnapshotCache to load and retrieve a
> snapshot's RocksDB instance. When Multiple background process, accesses the
> SnapshotCache's *get,* We also cleanup the pending eviction list.
> There is a scenario, where Thread 1(KeyDeletingService) is executing
> *get->cleanup* method and Thread 2(SSTFilteringService) is executing *get,*
> The reference count of the snapshot is incremented by *get* we still close
> the rocksDB instance because the *cleanup* method assumes everything in the
> pending eviction list has a reference count of 0. Which is not the case, We
> need to recheck this when closing the RocksDB instance. Other wise we end up
> in this scenario,
>
> {code:java}
> 2023-12-18 19:19:28,739 INFO
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache:
> Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
> 2023-12-18 19:19:28,741 ERROR
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService:
> Exception encountered while filtering a snapshot
> java.io.IOException: Rocks Database is closed
> 2023-12-18 19:20:28,739 INFO
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache:
> Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
> 2023-12-18 19:20:28,768 WARN
> [KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService:
> Background task execution failed
> java.lang.IllegalStateException: Cache map entry removal failure. The cache
> is in an inconsistent state. Expected OmSnapshot instance:
> org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual:
> org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@7656056
> 2023-12-18 19:20:28,768 WARN
> [SstFilteringService#0]-org.apache.hadoop.hdds.utils.BackgroundService:
> Background task execution failed
> java.lang.IllegalStateException: Cache map entry removal failure. The cache
> is in an inconsistent state. Expected OmSnapshot instance:
> org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: null
> 2023-12-18 19:21:06,486 WARN
> [Finalizer]-org.apache.hadoop.ozone.om.OmSnapshot:
> org.apache.hadoop.hdds.utils.db.RDBStore@4e5ac786 is not closed properly.
> snapshotName: snap-5griw
> 2023-12-18 19:21:28,742 ERROR
> [SstFilteringService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: Failed
> to retrieve snapshot: /vol-t2gj8/buck-07uux/snap-5griw
> java.io.IOException: Failed init RocksDB, db path :
> /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b,
> exception :org.rocksdb.RocksDBException lock hold by current process,
> acquire time 1702927228 acquiring thread 139777345017600:
> /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b/LOCK:
> No locks available
> at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
> at
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
> at
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:598)
> at
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:406)
> at
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:357)
> at
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
> at
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$0(SnapshotCache.java:171)
> at
> java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
> at
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:167)
> at
> org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:153)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]