cxzl25 opened a new pull request, #2697:
URL: https://github.com/apache/celeborn/pull/2697

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   Because the worker port is in use, the driver's worker status may change 
from shutdown status to unknown, causing the test to fail.
   
   https://github.com/apache/celeborn/actions/runs/10465286274/job/28980278764
   ```java
   - celeborn spark integration test - pushdata timeout will add to 
pushExcludedWorkers *** FAILED ***
     WORKER_UNKNOWN did not equal PUSH_DATA_TIMEOUT_PRIMARY, and WORKER_UNKNOWN 
did not equal PUSH_DATA_TIMEOUT_REPLICA (PushDataTimeoutTest.scala:150)
   ```
   
   unit-tests.log
   ```
   24/08/20 05:28:30,400 INFO [celeborn-dispatcher-7] Master: Receive 
ReportNodeFailure [
   Host: localhost
   RpcPort: 41487
   PushPort: 34259
   FetchPort: 45713
   ReplicatePort: 35107
   InternalPort: 41487
   
   24/08/20 05:29:29,414 WARN 
[celeborn-client-lifecycle-manager-change-partition-executor-3] 
WorkerStatusTracker: 
   Reporting failed workers:
   
Host:localhost:RpcPort:42267:PushPort:43741:FetchPort:46483:ReplicatePort:43587 
  PUSH_DATA_TIMEOUT_PRIMARY   2024-08-19T22:29:29.414-0700
   Current unknown workers:
   
Host:localhost:RpcPort:41487:PushPort:34259:FetchPort:45713:ReplicatePort:35107:InternalPort:41487
   2024-08-19T22:29:29.108-0700
   Current shutdown workers:
   
Host:localhost:RpcPort:41487:PushPort:34259:FetchPort:45713:ReplicatePort:35107:InternalPort:41487
   ```
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   GA
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to