[
https://issues.apache.org/jira/browse/FLINK-36348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuannan Su reassigned FLINK-36348:
----------------------------------
Assignee: Xuannan Su
> Netty shuffle direct memory consumption end-to-end test failed due to direct
> memory OOM
> ---------------------------------------------------------------------------------------
>
> Key: FLINK-36348
> URL: https://issues.apache.org/jira/browse/FLINK-36348
> Project: Flink
> Issue Type: Bug
> Components: Tests
> Affects Versions: 2.0-preview
> Reporter: Weijie Guo
> Assignee: Xuannan Su
> Priority: Major
>
> Found the root cause from downloaded artifacts.
> {code:java}
> org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
> Direct buffer memory (connection to 'localhost/127.0.0.1:45889
> [localhost:42633-cbcb9d]')
> at
> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:175)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:265)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:238)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:231)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:258)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:238)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:658)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory. The direct
> out-of-memory error has occurred. This can mean two things: either job(s)
> require(s) a larger size of JVM direct memory or there is a direct memory
> leak. The direct memory can be allocated by user code or some of its
> dependencies. In this case 'taskmanager.memory.task.off-heap.size'
> configuration option should be increased. Flink framework and its
> dependencies also consume the direct memory, mostly for network
> communication. The most of network memory is managed by Flink and should not
> result in out-of-memory error. In certain special cases, in particular for
> jobs with high parallelism, the framework may require more direct memory
> which is not managed by Flink. In this case
> 'taskmanager.memory.framework.off-heap.size' configuration option should be
> increased. If the error persists then there is probably a direct memory leak
> in user code or some of its dependencies which has to be investigated and
> fixed. The task executor has to be shutdown...
> at java.nio.Bits.reserveMemory(Bits.java:175) ~[?:?]
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) ~[?:?]
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) ~[?:?]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:717)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:692)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:180)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:137)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:129)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.runtime.io.network.netty.BufferResponseDecoder.onChannelActive(BufferResponseDecoder.java:54)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelActive(NettyMessageClientDecoderDelegate.java:74)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:262)
> ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
> ... 14 more
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62343&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=10d6732b-d79a-5c68-62a5-668516de5313&l=13005
--
This message was sent by Atlassian Jira
(v8.20.10#820010)