[ https://issues.apache.org/jira/browse/HDDS-8880 ]
Attila Doroszlai deleted comment on HDDS-8880:
----------------------------------------
was (Author: adoroszlai):
OM is waiting for Ratis:
{code}
"IPC Server handler 15 on default port 15004"
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:288)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:250)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:214)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:199)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$1338/1195943629.apply(Unknown
Source)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:147)
{code}
OM1 is waiting for OM3 to install snapshot, append log or similar (varies
across runs):
{code}
"omNode-1@group-523986131536->omNode-3-GrpcLogAppender-LogAppenderDaemon"
java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:286)
at
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:168)
at
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:78)
{code}
> Intermittent fork timeout in TestOMRatisSnapshots
> -------------------------------------------------
>
> Key: HDDS-8880
> URL: https://issues.apache.org/jira/browse/HDDS-8880
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Attila Doroszlai
> Priority: Critical
> Attachments: it-om.zip
>
>
> Not sure if this has the same root cause as HDDS-8876 or not, so filing
> separately.
> Surefire fork is killed due to timeout for TestOMRatisSnapshot.
> Main thread is waiting for response from OM at various points (see thread
> dumps linked). Example:
> {code}
> "main"
> java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:65)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1572)
> ...
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:304)
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.updateKey(OzoneManagerProtocolClientSideTranslatorPB.java:802)
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.commitKey(OzoneManagerProtocolClientSideTranslatorPB.java:760)
> at
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.commitKey(BlockOutputStreamEntryPool.java:341)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:557)
> at
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:86)
> at
> org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey(TestOzoneManagerHA.java:231)
> at
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.writeKeys(TestOMRatisSnapshots.java:1016)
> at
> org.apache.hadoop.ozone.om.TestOMRatisSnapshots.testInstallSnapshot(TestOMRatisSnapshots.java:205)
> {code}
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/05/30/22802/it-om/2023-05-30T07-30-54_006-jvmRun1.dump
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23399/it-om/2023-06-16T10-42-07_017-jvmRun1.dump
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23408/it-om/2023-06-16T12-31-16_382-jvmRun1.dump
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23417/it-om/2023-06-16T13-41-55_057-jvmRun1.dump
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23421/it-om/2023-06-16T14-54-13_823-jvmRun1.dump
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/06/16/23425/it-om/2023-06-16T16-03-34_143-jvmRun1.dump
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]