jerqi commented on PR #1521: URL: https://github.com/apache/incubator-uniffle/pull/1521#issuecomment-1942988443
> I reopened this PR. > > After stress testing the shuffle server without this PR, we will easily encounter `OutOfDirectMemoryError`, which means this PR is necessary. > > [epollEventLoopGroup-3-45] [WARN] TransportChannelHandler.exceptionCaught - Exception in connection from /127.0.0.1:58767 io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4194304 byte(s) of direct memory (used: 161061273600, max: 161061273600) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:843) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:772) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:710) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:685) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:212) at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:194) at io.netty.buffer.PoolArena.allocate(PoolArena.java:136) at io.netty.buffer.PoolArena.allocate(PoolArena.java:126) at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:397) at io.netty.buffer.AbstractByteBufAllocator.directBuff er(AbstractByteBufAllocator.java:188) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) at org.apache.uniffle.common.netty.protocol.Decoders.decodeShuffleBlockInfo(Decoders.java:50) at org.apache.uniffle.common.netty.protocol.SendShuffleDataRequest.decodePartitionData(SendShuffleDataRequest.java:95) at org.apache.uniffle.common.netty.protocol.SendShuffleDataRequest.decode(SendShuffleDataRequest.java:107) at org.apache.uniffle.common.netty.protocol.Message.decode(Message.java:145) at org.apache.uniffle.common.netty.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:77) > > We can see that each time an out-of-direct-memory error occurs, it is caused by the code `org.apache.uniffle.common.netty.protocol.Decoders.decodeShuffleBlockInfo(Decoders.java:50)`, which is `ByteBuf data = NettyUtils.getNettyBufferAllocator().directBuffer(dataLength)`. This is the most direct trigger for insufficient direct memory. > > Because when a large number of requests arrive simultaneously, there might be a brief period (before the `TransportFrameDecoder` has a chance to `release` the `ByteBuf`) during which the shuffle server has double the created `ByteBuf`. This means, for a very short time, the direct memory usage is doubled, which is extremely uncontrollable. That is why it is very easy to cause an out-of-direct-memory error without this PR. > > So, we need this PR anyway. It might slow down the flushing process a little bit, but the shuffle server will at least remain available during the whole stress test. > > From the results of my stress tests, there doesn't seem to be any impact on performance. In fact, it may even be faster, as it can speed up the decoding process by not reallocating new `ByteBufs` on the other hand. There have been no anomalies or performance issues caused by the slowing down of the flushing process. Eventually, all these buffers will be flushed, and all `ByteBufs` will be successfully released, with no memory leaks. > > PTAL @jerqi Maybe we should modify our flush strategy, too. Now we will flush a larger reduce partition. But if the map partition contains a smaller reduce partition. The memory won't be released, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
