[ https://issues.apache.org/jira/browse/HDDS-8880 ]


    Attila Doroszlai deleted comment on HDDS-8880:
    ----------------------------------------

was (Author: adoroszlai):
OM is waiting for Ratis:

{code}
"IPC Server handler 15 on default port 15004" 
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
        at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
        at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:288)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:250)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:214)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:199)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$1338/1195943629.apply(Unknown
 Source)
        at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:147)
{code}

OM1 is waiting for OM3 to install snapshot, append log or similar (varies 
across runs):

{code}
"omNode-1@group-523986131536->omNode-3-GrpcLogAppender-LogAppenderDaemon" 
   java.lang.Thread.State: TIMED_WAITING
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:286)
        at 
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:168)
        at 
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:78)
{code}

> Intermittent fork timeout in TestOMRatisSnapshots
> -------------------------------------------------
>
>                 Key: HDDS-8880
>                 URL: https://issues.apache.org/jira/browse/HDDS-8880
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Attila Doroszlai
>            Priority: Critical
>         Attachments: it-om.zip
>
>
> Not sure if this has the same root cause as HDDS-8876 or not, so filing 
> separately.
> Surefire fork is killed due to timeout for TestOMRatisSnapshot.
> Main thread is waiting for response from OM at various points (see thread 
> dumps linked).  Example:
> {code}
> "main" 
>    java.lang.Thread.State: WAITING
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:502)
>         at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:65)
>         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1572)
>         ...
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:304)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.updateKey(OzoneManagerProtocolClientSideTranslatorPB.java:802)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.commitKey(OzoneManagerProtocolClientSideTranslatorPB.java:760)
>         at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.commitKey(BlockOutputStreamEntryPool.java:341)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:557)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:86)
>         at 
> org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey(TestOzoneManagerHA.java:231)
>         at 
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.writeKeys(TestOMRatisSnapshots.java:1016)
>         at 
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(TestOMRatisSnapshots.java:205)
> {code}
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/05/30/22802/it-om/2023-05-30T07-30-54_006-jvmRun1.dump
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23399/it-om/2023-06-16T10-42-07_017-jvmRun1.dump
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23408/it-om/2023-06-16T12-31-16_382-jvmRun1.dump
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23417/it-om/2023-06-16T13-41-55_057-jvmRun1.dump
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23421/it-om/2023-06-16T14-54-13_823-jvmRun1.dump
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23425/it-om/2023-06-16T16-03-34_143-jvmRun1.dump



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to