[
https://issues.apache.org/jira/browse/RATIS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898452#comment-17898452
]
Ivan Andika commented on RATIS-2189:
------------------------------------
[~szetszwo] Thanks for checking this out.
> Which version of Ozone/Ratis are you testing? We saw a similar problem
> previously. Not sure if it is the same.
We are using Ozone 1.4.1 and Ratis 3.1.1. May I know which problem you are
referring to?
Currently, we are exploring using Ozone as the S3 remote storage for Kafka
tiered storage feature. However, during stress test we found that when Kafka
increased the amount of write and read workload to the S3G, it seems that
Datanode threw "java.lang.OutOfMemoryError: Direct buffer memory" in the
datanodes, causing write to be stuck with the following stacktrace that I
assume means that it was intefering with Hadoop RPC that also uses direct
buffer memory.
{code:java}
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:695)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.write(IOUtil.java:58)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:158)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at org.apache.hadoop.ipc.Client$IpcStreams.sendRequest(Client.java:1930)
at
org.apache.hadoop.ipc.Client$Connection$RpcRequestSender.run(Client.java:1113)
at java.lang.Thread.run(Thread.java:748){code}
We also encountered some TimeoutIOException on client and DN side.
{code:java}
2024-11-13 11:33:43,537 [NettyClientStreamRpc-workerGroup--thread121] ERROR
org.apache.ratis.client.impl.OrderedStreamAsync: Failed to send request,
header=DataStreamRequestHeader:clientId=client-53089C94F05D,type=STREAM_DATA,id=139369,offset=65011712,length=1048576
java.util.concurrent.CompletionException:
org.apache.ratis.protocol.exceptions.TimeoutIOException: Timeout 10s: Failed to
send
DataStreamRequestByteBuffer:clientId=client-53089C94F05D,type=STREAM_DATA,id=139369,offset=65011712,length=1048576
via channel [id: 0x5ac464c1, L:/10.80.133.23:42750 -
R:10.80.135.22/10.80.135.22:9855]
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at
org.apache.ratis.netty.client.NettyClientStreamRpc.lambda$null$1(NettyClientStreamRpc.java:470)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:405)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ratis.protocol.exceptions.TimeoutIOException: Timeout
10s: Failed to send
DataStreamRequestByteBuffer:clientId=client-53089C94F05D,type=STREAM_DATA,id=139369,offset=65011712,length=1048576
via channel [id: 0x5ac464c1, L:/10.80.133.23:42750 -
R:10.80.135.22/10.80.135.22:9855]
... 10 more
2024-11-13 11:33:43,537 [NettyClientStreamRpc-workerGroup--thread121] ERROR
org.apache.ratis.client.impl.OrderedStreamAsync: Failed to send request,
header=DataStreamRequestHeader:clientId=client-53089C94F05D,type=STREAM_DATA,id=139369,offset=66060288,length=1048576
java.util.concurrent.CompletionException: java.lang.IllegalStateException:
Request{streamOffset=66060288, type=STREAM_DATA}, :
Request{streamOffset=65011712, type=STREAM_DATA} failed
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at
org.apache.ratis.netty.client.NettyClientReplies$ReplyEntry.completeExceptionally(NettyClientReplies.java:167)
at
org.apache.ratis.netty.client.NettyClientReplies$ReplyMap.completeExceptionally(NettyClientReplies.java:86)
at
org.apache.ratis.netty.client.NettyClientReplies$ReplyMap.failAll(NettyClientReplies.java:92)
at
org.apache.ratis.netty.client.NettyClientReplies$ReplyMap.fail(NettyClientReplies.java:97)
at
org.apache.ratis.netty.client.NettyClientStreamRpc.lambda$null$1(NettyClientStreamRpc.java:472)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:405)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Request{streamOffset=66060288,
type=STREAM_DATA}, : Request{streamOffset=65011712, type=STREAM_DATA} failed
... 12 more
2024-11-13 11:33:43,537 [qtp582300198-205] WARN
org.apache.hadoop.hdds.scm.storage.BlockDataStreamOutput: Failed to write all
chunks through stream: java.util.concurrent.ExecutionException:
org.apache.ratis.protocol.exceptions.TimeoutIOException: Timeout 10s: Failed to
send
DataStreamRequestByteBuffer:clientId=client-53089C94F05D,type=STREAM_DATA,id=139369,offset=65011712,length=1048576
via channel [id: 0x5ac464c1, L:/10.80.133.23:42750 -
R:10.80.135.22/10.80.135.22:9855] {code}
May I know whether this is expected when the datanodes are fully utilized due
to the high read and write traffics?
> Use ByteBufAllocator#ioBuffer in NettyDataStreamUtils
> -----------------------------------------------------
>
> Key: RATIS-2189
> URL: https://issues.apache.org/jira/browse/RATIS-2189
> Project: Ratis
> Issue Type: Improvement
> Components: Streaming
> Reporter: Ivan Andika
> Priority: Minor
>
> Currently, NettyDataStreamUtils uses ByteBufAllocator#directBuffer which
> forces all ByteBufAllocator to allocate direct buffer even for
> PreferHeapByteBufAllocator (e.g. when we set
> -Dorg.apache.ratis.thirdparty.io.netty.noPreferDirect=true).
> It's better to use ioBuffer and delegates to the actual ByteBufAllocator to
> the type of memory it will use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)