HuangZhenQiu commented on a change in pull request #11541:
URL: https://github.com/apache/flink/pull/11541#discussion_r442925856
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/PartitionRequestClientFactory.java
##########
@@ -131,6 +134,42 @@ void destroyPartitionRequestClient(ConnectionID
connectionId, PartitionRequestCl
clients.remove(connectionId, client);
}
+ private NettyPartitionRequestClient
connectChannelWithRetry(ConnectingChannel connectingChannel,
+
ConnectionID connectionId, boolean needConnect)
+ throws IOException, InterruptedException {
+ int count = 0;
+ Exception exception = null;
+ do {
+ try {
+ if (needConnect) {
+ LOG.info("Connecting to {} at {}
attempt", connectionId.getAddress(), count);
+
nettyClient.connect(connectionId.getAddress()).addListener(connectingChannel);
+ }
+
+ NettyPartitionRequestClient client =
connectingChannel.waitForChannel();
+ clients.replace(connectionId,
connectingChannel, client);
+ return client;
+ } catch (IOException | ChannelException e) {
+ LOG.error("Failed {} times to connect to {}",
count, connectionId.getAddress(), e);
Review comment:
Good catch. There is why I syncronized on connectionId in line 74. It
makes only one thread can goes into the connection for particular connectionId,
it either success or fail after several times of retries.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]