George Jahad created HDDS-8952:
----------------------------------
Summary: MiniOzoneHAClusterImpl frequently hangs when writing keys
to bucket.
Key: HDDS-8952
URL: https://issues.apache.org/jira/browse/HDDS-8952
Project: Apache Ozone
Issue Type: Sub-task
Reporter: George Jahad
MiniOzoneHAClusterImpl frequently hangs when writing keys to bucket.
To show the problem, we have a simple test that just writes 10,000 keys. It
regularly hangs/times out when running on the MiniOzoneHAClusterImpl cluster.
This is a problem because the MiniOzoneHAClusterImpl is critical for testing
the snapshot bootstrap code. We believe it is the root cause of this issue:
https://issues.apache.org/jira/browse/HDDS-8876
This simple test shows the problem:
https://github.com/GeorgeJahad/ozone/compare/153659032b..testSimple
In my experience it hangs 20-30% of the time when running on repeat in intellij.
When it does hang the client thread is in the process of writing/commiting the
key, which the server side translator thread is waiting on a future. I believe
that future is waiting on a response from the active follower. I've include
stack traces for both threads below when the test is hung.
Occasionally when it hangs we see the client thread in exactly the same place
but there is no corresponding server side translator thread. We are guessing
in this case the server side thread gets killed by an unhandled exception, but
the client side thread doesn't notice and just waits forever instead of
retrying.
Here are the stack traces for the two threads.
{{Client thread:}}
{{"main@1" prio=5 tid=0x1 nid=NA waiting}}
{{ java.lang.Thread.State: WAITING}}
{{ at java.lang.Object.wait(Object.java:-1)}}
{{ at java.lang.Object.wait(Object.java:502)}}
{{ at
org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:65)}}
{{ at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1572)}}
{{ at org.apache.hadoop.ipc.Client.call(Client.java:1530)}}
{{ at org.apache.hadoop.ipc.Client.call(Client.java:1427)}}
{{ at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:250)}}
{{ at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:132)}}
{{ at com.sun.proxy.$Proxy54.submitRequest(Unknown Source:-1)}}
{{ at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source:-1)}}
{{ at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
{{ at java.lang.reflect.Method.invoke(Method.java:498)}}
{{ at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)}}
{{ at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)}}
{{ at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)}}
{{ at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)}}
{{ - locked <0x208df> (a
org.apache.hadoop.io.retry.RetryInvocationHandler$Call)}}
{{ at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)}}
{{ at com.sun.proxy.$Proxy54.submitRequest(Unknown Source:-1)}}
{{ at
org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransport.submitRequest(Hadoop3OmTransport.java:80)}}
{{ at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:304)}}
{{ at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.updateKey(OzoneManagerProtocolClientSideTranslatorPB.java:802)}}
{{ at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.commitKey(OzoneManagerProtocolClientSideTranslatorPB.java:760)}}
{{ at
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.commitKey(BlockOutputStreamEntryPool.java:341)}}
{{ at
org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:557)}}
{{ - locked <0x208e0> (a
org.apache.hadoop.ozone.client.io.KeyOutputStream)}}
{{ at
org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:86)}}
{{ - locked <0x208e1> (a
org.apache.hadoop.ozone.client.io.OzoneOutputStream)}}
{{ at
org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey(TestOzoneManagerHA.java:247)}}
{{ at
org.apache.hadoop.ozone.om.TestOMRatisSnapshots.writeKeys(TestOMRatisSnapshots.java:1085)}}
{{Server thread:}}
{{"IPC Server handler 5 on default port 15133@101855" daemon prio=5 tid=0x159ab
nid=NA waiting}}
{{ java.lang.Thread.State: WAITING}}
{{ at sun.misc.Unsafe.park(Unsafe.java:-1)}}
{{ at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)}}
{{ at
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)}}
{{ at
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)}}
{{ at
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)}}
{{ at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)}}
{{ at
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:293)}}
{{ at
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:250)}}
{{ at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:215)}}
{{ at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:200)}}
{{ at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$1300.1755468564.apply(Unknown
Source:-1)}}
{{ at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)}}
{{ at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:142)}}
{{ at
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java:-1)}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]