[
https://issues.apache.org/jira/browse/FLINK-35587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855933#comment-17855933
]
Junrui Li commented on FLINK-35587:
-----------------------------------
[~showuon] Unfortunately, I didn't save the full logs as they are extremely
large. And I did investigate the relevant logs and didn't find any other
related information besides the exception stack trace provided below:
2024-06-13 13:32:51,000 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - MultipleInput[5162]
(985/1000)
(6ff5e43c6f702fdbe4a47889fe5d0f92_d222c5a8f36c1518f55d193152e8a83f_984_0)
switched from RUNNING to FAILED on container_e01_1717399913264_0220_01_000055 @
core-1-1.c-b72ca0bb18f973ba.cn-zhangjiakou.emr.aliyuncs.com (dataPort=38151).
java.io.IOException: java.lang.IllegalStateException: The read buffer is null
in credit-based input channel. at
org.apache.flink.runtime.io.network.partition.consumer.InputChannel.checkError(InputChannel.java:275)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.checkPartitionRequestQueueInitialized(RemoteInputChannel.java:885)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.getNextBuffer(RemoteInputChannel.java:257)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.readBufferFromInputChannel(SingleInputGate.java:917)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.readRecoveredOrNormalBuffer(SingleInputGate.java:912)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:852)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:823)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.pollNext(SingleInputGate.java:811)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.taskmanager.InputGateWithMetrics.pollNext(InputGateWithMetrics.java:130)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:150)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:122)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:579)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:909)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:858)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_392] Caused by:
java.lang.IllegalStateException: The read buffer is null in credit-based input
channel. at
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.decodeBufferOrEvent(CreditBasedPartitionRequestClientHandler.java:367)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.decodeMsg(CreditBasedPartitionRequestClientHandler.java:291)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelRead(CreditBasedPartitionRequestClientHandler.java:191)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelRead(NettyMessageClientDecoderDelegate.java:112)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] ... 1 more
> Job fails with "The read buffer is null in credit-based input channel" on
> TPC-DS 10TB benchmark
> -----------------------------------------------------------------------------------------------
>
> Key: FLINK-35587
> URL: https://issues.apache.org/jira/browse/FLINK-35587
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Affects Versions: 1.20.0
> Reporter: Junrui Li
> Assignee: Weijie Guo
> Priority: Blocker
> Attachments: image-2024-06-13-13-48-37-162.png
>
>
> While running TPC-DS 10TB benchmark on the latest master branch locally, I've
> encountered a failure in certain queries, specifically query 75, resulting in
> the error "The read buffer is null in credit-based input channel".
> Using a binary search approach, I identified the offending commit as
> FLINK-33668. After reverting FLINK-33668 and subsequent commits, the issue
> disappears. Re-applying FLINK-33668 to the branch re-introduces the error.
> Please see the attached image for the error stack trace.
> !image-2024-06-13-13-48-37-162.png|width=846,height=555!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)