ArafatKhan2198 opened a new pull request, #5782:
URL: https://github.com/apache/ozone/pull/5782
## What changes were proposed in this pull request?
`TestSecureContainerServer` uses random ports and intermittently runs into
the following binding error.
The initial thought of why we might be getting the binding error is because
of the datanodes of a pipeline might be having similar port numbers due to
which it is failing, so I tried to replicate this on my local fork branch by
adding a few logs in which I print out the datanode details of each datanode of
a specific pipeline and was able to replicate the issue.
By checking the logs the error was being thrown by port `/0.0.0.0:37397` so
I figured there might be two datanodes having the same port when I checked the
logs, there was no problem with the ports assigned and yet the binding error
was thrown. **_I believe it could be because of Port Release Delay after a
previous test completes, there might be a delay in releasing the port._**
I have implemented a "Retry Check with Delay" method that starts the
specified server and retries in case it fails to bind to its designated port
with a small delay.
```
org.apache.ratis.util.ExitUtils$ExitException: Failed to start Grpc server
at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:141)
at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:151)
at
org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:300)
at
org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
at
org.apache.ratis.server.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:72)
at
org.apache.ratis.server.impl.RaftServerProxy.startImpl(RaftServerProxy.java:407)
at
org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:270)
at
org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:400)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:552)
at
org.apache.hadoop.ozone.container.server.TestSecureContainerServer.runTestClientServer(TestSecureContainerServer.java:250)
at
org.apache.hadoop.ozone.container.server.TestSecureContainerServer.runTestClientServerRatis(TestSecureContainerServer.java:223)
at
org.apache.hadoop.ozone.container.server.TestSecureContainerServer.testClientServerRatisGrpc(TestSecureContainerServer.java:201)
...
Caused by: java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:
37397
at
org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:326)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:185)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:94)
at
org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:298)
... 53 more
Caused by:
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
bind(..) failed: Address already in use
```
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9881
## How was this patch tested?
Ran the fix a total of 900 times and all passed.
### Repeated Fork Runs ➖
Test Run 1 ➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192417354/attempts/1
Test Run 1 Repeated ➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192417354/attempts/2
Test Run 1 Repeated➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192417354
Test Run 2 ➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192423383/attempts/1
Test Run 2 Repeated ➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192423383/attempts/2
Test Run 2 Repeated➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192423383
Test Run 3 ➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192422339/attempts/1
Test Run 3 Repeated ➖
https://github.com/ArafatKhan2198/ozone/actions/runs/7192422339/attempts/2
Test Run 3 Repeated
➖https://github.com/ArafatKhan2198/ozone/actions/runs/7192422339
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]