cxzl25 opened a new pull request, #2697: URL: https://github.com/apache/celeborn/pull/2697
### What changes were proposed in this pull request? ### Why are the changes needed? Because the worker port is in use, the driver's worker status may change from shutdown status to unknown, causing the test to fail. https://github.com/apache/celeborn/actions/runs/10465286274/job/28980278764 ```java - celeborn spark integration test - pushdata timeout will add to pushExcludedWorkers *** FAILED *** WORKER_UNKNOWN did not equal PUSH_DATA_TIMEOUT_PRIMARY, and WORKER_UNKNOWN did not equal PUSH_DATA_TIMEOUT_REPLICA (PushDataTimeoutTest.scala:150) ``` unit-tests.log ``` 24/08/20 05:28:30,400 INFO [celeborn-dispatcher-7] Master: Receive ReportNodeFailure [ Host: localhost RpcPort: 41487 PushPort: 34259 FetchPort: 45713 ReplicatePort: 35107 InternalPort: 41487 24/08/20 05:29:29,414 WARN [celeborn-client-lifecycle-manager-change-partition-executor-3] WorkerStatusTracker: Reporting failed workers: Host:localhost:RpcPort:42267:PushPort:43741:FetchPort:46483:ReplicatePort:43587 PUSH_DATA_TIMEOUT_PRIMARY 2024-08-19T22:29:29.414-0700 Current unknown workers: Host:localhost:RpcPort:41487:PushPort:34259:FetchPort:45713:ReplicatePort:35107:InternalPort:41487 2024-08-19T22:29:29.108-0700 Current shutdown workers: Host:localhost:RpcPort:41487:PushPort:34259:FetchPort:45713:ReplicatePort:35107:InternalPort:41487 ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
