[ 
https://issues.apache.org/jira/browse/HDDS-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808012#comment-17808012
 ] 

Hemant Kumar edited comment on HDDS-9486 at 1/18/24 5:47 AM:
-------------------------------------------------------------

I looked at this and there is a deadlock between checkpointing creation for 
Bootstrapping and RocksDBCheckpointDiffer#pruneSstFiles.

Bootstrapping takes the BootstrapStateHandler#lock before checkpointing 
creation and then takes lock on RocksDBCheckpointDiffer  instance to unpause 
the compaction thread/s.. On the other hand 
RocksDBCheckpointDiffer#pruneSstFiles is synchronized function which first 
takes lock on RocksDBCheckpointDiffer and takes BootstrapStateHandler#lock 
before removing any files.

Looking at this more, I don't think we need this synchronized block 
https://github.com/apache/ozone/pull/5104/files#diff-4e8bcca4269db3fa926667c07d733f58628b13b417bbd76d06c1683edbbd9750R227
 


was (Author: JIRAUSER297350):
I looked at this and there is a deadlock between checkpointing creation for 
Bootstrapping and RocksDBCheckpointDiffer#pruneSstFiles.

Bootstrapping takes the BootstrapStateHandler#lock before checkpointing 
creation and then takes lock on RocksDBCheckpointDiffer instance to create 
checkpoint. On the other hand RocksDBCheckpointDiffer#pruneSstFiles is 
synchronized function which first takes lock on RocksDBCheckpointDiffer and 
takes BootstrapStateHandler#lock before removing any files.

Looking at this more, I don't think we need this synchronized block 
https://github.com/apache/ozone/pull/5104/files#diff-4e8bcca4269db3fa926667c07d733f58628b13b417bbd76d06c1683edbbd9750R227
 

> Intermittent fork timeout in TestSnapshotBackgroundServices
> -----------------------------------------------------------
>
>                 Key: HDDS-9486
>                 URL: https://issues.apache.org/jira/browse/HDDS-9486
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: test
>    Affects Versions: 1.4.0
>            Reporter: Attila Doroszlai
>            Assignee: Hemant Kumar
>            Priority: Major
>         Attachments: 2023-09-07T11-48-29_820-jvmRun1.dump, 
> 2023-09-14T11-32-20_981-jvmRun1.dump
>
>
> Surefire fork for {{TestSnapshotBackgroundServices}} intermittently times out.
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/07/25178/it-om/output.log
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/14/25374/it-om/output.log
> CC [~hemantk], [~mladjangadzic]
> {code}
> "CompactionDagPruningService" 
>    java.lang.Thread.State: WAITING
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
>         at 
> org.apache.hadoop.ozone.lock.BootstrapStateHandler$Lock.lock(BootstrapStateHandler.java:31)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.pruneSstFiles(RocksDBCheckpointDiffer.java:1506)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer$$Lambda$573/124020389.run(Unknown
>  Source)
> "qtp555959536-13964" 
>    java.lang.Thread.State: BLOCKED
>         at 
> org.apache.hadoop.ozone.om.OMDBCheckpointServlet.getCheckpoint(OMDBCheckpointServlet.java:255)
>         at 
> org.apache.hadoop.hdds.utils.DBCheckpointServlet.generateSnapshotCheckpoint(DBCheckpointServlet.java:200)
>         at 
> org.apache.hadoop.hdds.utils.DBCheckpointServlet.doPost(DBCheckpointServlet.java:321)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:523)
>  "main" 
>    java.lang.Thread.State: BLOCKED
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.close(RocksDBCheckpointDiffer.java:340)
>         at org.apache.hadoop.hdds.utils.IOUtils.close(IOUtils.java:78)
>         at org.apache.hadoop.hdds.utils.IOUtils.close(IOUtils.java:64)
>         at org.apache.hadoop.hdds.utils.IOUtils.closeQuietly(IOUtils.java:92)
>         at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer$RocksDBCheckpointDifferHolder.invalidateCacheEntry(RocksDBCheckpointDiffer.java:1591)
>         at org.apache.hadoop.hdds.utils.db.RDBStore.close(RDBStore.java:224)
>         at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.stop(OmMetadataManagerImpl.java:753)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.stop(OzoneManager.java:2246)
>         at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.stop(MiniOzoneHAClusterImpl.java:304)
>         at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:446)
>         at 
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices.shutdown(TestSnapshotBackgroundServices.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to