HuangZhenQiu commented on a change in pull request #11541:
URL: https://github.com/apache/flink/pull/11541#discussion_r442926109
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/PartitionRequestClientFactory.java
##########
@@ -131,6 +134,42 @@ void destroyPartitionRequestClient(ConnectionID
connectionId, PartitionRequestCl
clients.remove(connectionId, client);
}
+ private NettyPartitionRequestClient
connectChannelWithRetry(ConnectingChannel connectingChannel,
+
ConnectionID connectionId, boolean needConnect)
+ throws IOException, InterruptedException {
+ int count = 0;
+ Exception exception = null;
+ do {
+ try {
+ if (needConnect) {
+ LOG.info("Connecting to {} at {}
attempt", connectionId.getAddress(), count);
+
nettyClient.connect(connectionId.getAddress()).addListener(connectingChannel);
+ }
+
+ NettyPartitionRequestClient client =
connectingChannel.waitForChannel();
+ clients.replace(connectionId,
connectingChannel, client);
+ return client;
+ } catch (IOException | ChannelException e) {
+ LOG.error("Failed {} times to connect to {}",
count, connectionId.getAddress(), e);
+ ConnectingChannel newConnectingChannel = new
ConnectingChannel(connectionId, this);
+ clients.replace(connectionId,
connectingChannel, newConnectingChannel);
Review comment:
Yes. It is the reason of deadlock before adding the synchronized on
connectionId change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]