adoroszlai opened a new pull request, #4699:
URL: https://github.com/apache/ozone/pull/4699
## What changes were proposed in this pull request?
`TestDecommissionAndMaintenance` uses `MiniOzoneClusterProvider` to
provision clusters in the background. Tests intermittently fail due to port
conflict.
```
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 348.139 s
<<< FAILURE! - in
org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance
org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance.testNodeWithOpenPipelineCanBeDecommissionedAndRecommissioned
Time elapsed: 159.55 s <<< ERROR!
java.util.concurrent.TimeoutException:
...
at
org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:218)
at
org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:431)
at
org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance.testNodeWithOpenPipelineCanBeDecommissionedAndRecommissioned(TestDecommissionAndMaintenance.java:234)
```
The problem is that, while the datanode is stopped, its ports may be reused
by some component in a new cluster being provisioned in the background. The
original owner of the port fails to start, cluster never becomes ready again.
```
2023-05-10 07:26:13,629 [EndpointStateMachine task thread for /0.0.0.0:45947
- 0 ] INFO server.GrpcService (GrpcService.java:startImpl(302)) -
3193002e-fc2b-4cc9-9970-da2531c45e46: GrpcService started, listening on 44925
...
2023-05-10 07:26:37,941 [main] INFO server.GrpcService
(GrpcService.java:closeImpl(320)) - 3193002e-fc2b-4cc9-9970-da2531c45e46:
shutdown server GrpcServerProtocolService successfully
...
2023-05-10 07:26:45,485 [EndpointStateMachine task thread for /0.0.0.0:34213
- 0 ] INFO server.GrpcService (GrpcService.java:startImpl(302)) -
0c852ae0-3c0b-4f2d-b68a-19e305d37000: GrpcService started, listening on 44925
...
2023-05-10 07:26:46,652 [EndpointStateMachine task thread for /0.0.0.0:45947
- 0 ] INFO ratis.XceiverServerRatis (XceiverServerRatis.java:start(517)) -
Starting XceiverServerRatis 3193002e-fc2b-4cc9-9970-da2531c45e46
2023-05-10 07:26:46,658 [EndpointStateMachine task thread for /0.0.0.0:45947
- 0 ] ERROR server.GrpcService (ExitUtils.java:terminate(133)) - Terminating
with exit status 1: Failed to start Grpc server
java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:44925
...
Caused by:
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
bind(..) failed: Address already in use
```
This PR replaces random ports with a simple incremental allocation starting
at 15000. It applies to all `MiniOzoneCluster`-based tests.
https://issues.apache.org/jira/browse/HDDS-8581
## How was this patch tested?
CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4945442087
100x run of `TestDecommissionAndMaintenance`:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4944968792
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]