[
https://issues.apache.org/jira/browse/FLINK-25606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yingjie Cao updated FLINK-25606:
--------------------------------
Component/s: Runtime / Network
> Requesting exclusive buffers timeout when recovering from unaligned
> checkpoint under fine-grained resource mode
> ---------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-25606
> URL: https://issues.apache.org/jira/browse/FLINK-25606
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Reporter: Yingjie Cao
> Priority: Major
>
> When converting the RecoveredInputChannel to RemoteInputChannel, the network
> buffer is not enough to initialize input channel exclusive buffers. Here is
> the exception stack:
> {code:java}
> java.io.IOException: Timeout triggered when requesting exclusive buffers: The
> total number of network buffers is currently set to 6144 of 32768 bytes each.
> You can increase this number by setting the configuration keys
> 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and
> 'taskmanager.memory.network.max', or you may increase the timeout which is
> 30000ms by setting the key
> 'taskmanager.network.memory.exclusive-buffers-request-timeout-ms'.
> at
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestMemorySegments(NetworkBufferPool.java:205)
> at
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestMemorySegments(NetworkBufferPool.java:60)
> at
> org.apache.flink.runtime.io.network.partition.consumer.BufferManager.requestExclusiveBuffers(BufferManager.java:133)
> at
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.setup(RemoteInputChannel.java:157)
> at
> org.apache.flink.runtime.io.network.partition.consumer.RemoteRecoveredInputChannel.toInputChannelInternal(RemoteRecoveredInputChannel.java:77)
> at
> org.apache.flink.runtime.io.network.partition.consumer.RecoveredInputChannel.toInputChannel(RecoveredInputChannel.java:106)
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.convertRecoveredInputChannels(SingleInputGate.java:307)
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:290)
> at
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:94)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:359)
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:323)
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.lang.Thread.run(Thread.java:834) {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)