[
https://issues.apache.org/jira/browse/FLINK-22553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yun Gao updated FLINK-22553:
----------------------------
Fix Version/s: 1.16.0
> Improve error reporting on TM connection failures
> -------------------------------------------------
>
> Key: FLINK-22553
> URL: https://issues.apache.org/jira/browse/FLINK-22553
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Reporter: Roman Khachatryan
> Priority: Minor
> Fix For: 1.15.0, 1.16.0
>
>
> Connection failures reported by NettyPartitionRequestClient contain
> misleading NPE in their stacktrace, e.g. reported in
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/remote-task-manager-netty-exception-td43401.html
> {code}
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connecting to remote task manager '/100.98.115.117:41245' has failed. This
> might indicate that the remote task manager has been lost.
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:145)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:114)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:81)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:70)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:179)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:321)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:290)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:94)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:297)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:189)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
> [flink-dist_2.12-1.12.2.jar:1.12.2]
> at java.lang.Thread.run(Thread.java:834) [?:?]Caused by:
> java.lang.NullPointerException
> at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:59)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient.<init>(NettyPartitionRequestClient.java:74)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:136)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 16 more
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)