[ 
https://issues.apache.org/jira/browse/FLINK-35587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855933#comment-17855933
 ] 

Junrui Li commented on FLINK-35587:
-----------------------------------

[~showuon]  Unfortunately, I didn't save the full logs as they are extremely 
large. And I did investigate the relevant logs and didn't find any other 
related information besides the exception stack trace provided below:

2024-06-13 13:32:51,000 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - MultipleInput[5162] 
(985/1000) 
(6ff5e43c6f702fdbe4a47889fe5d0f92_d222c5a8f36c1518f55d193152e8a83f_984_0) 
switched from RUNNING to FAILED on container_e01_1717399913264_0220_01_000055 @ 
core-1-1.c-b72ca0bb18f973ba.cn-zhangjiakou.emr.aliyuncs.com (dataPort=38151). 
java.io.IOException: java.lang.IllegalStateException: The read buffer is null 
in credit-based input channel. at 
org.apache.flink.runtime.io.network.partition.consumer.InputChannel.checkError(InputChannel.java:275)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.checkPartitionRequestQueueInitialized(RemoteInputChannel.java:885)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.getNextBuffer(RemoteInputChannel.java:257)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.readBufferFromInputChannel(SingleInputGate.java:917)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.readRecoveredOrNormalBuffer(SingleInputGate.java:912)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:852)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:823)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.pollNext(SingleInputGate.java:811)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.taskmanager.InputGateWithMetrics.pollNext(InputGateWithMetrics.java:130)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:150)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:122)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:579)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:909)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:858) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) 
~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_392] Caused by: 
java.lang.IllegalStateException: The read buffer is null in credit-based input 
channel. at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.decodeBufferOrEvent(CreditBasedPartitionRequestClientHandler.java:367)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.decodeMsg(CreditBasedPartitionRequestClientHandler.java:291)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelRead(CreditBasedPartitionRequestClientHandler.java:191)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelRead(NettyMessageClientDecoderDelegate.java:112)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] at 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT] ... 1 more

> Job fails with "The read buffer is null in credit-based input channel" on 
> TPC-DS 10TB benchmark
> -----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35587
>                 URL: https://issues.apache.org/jira/browse/FLINK-35587
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.20.0
>            Reporter: Junrui Li
>            Assignee: Weijie Guo
>            Priority: Blocker
>         Attachments: image-2024-06-13-13-48-37-162.png
>
>
> While running TPC-DS 10TB benchmark on the latest master branch locally, I've 
> encountered a failure in certain queries, specifically query 75, resulting in 
> the error "The read buffer is null in credit-based input channel".
> Using a binary search approach, I identified the offending commit as 
> FLINK-33668. After reverting FLINK-33668 and subsequent commits, the issue 
> disappears. Re-applying FLINK-33668 to the branch re-introduces the error.
> Please see the attached image for the error stack trace.
> !image-2024-06-13-13-48-37-162.png|width=846,height=555!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to