DanielCarter-stack commented on issue #10560:
URL: https://github.com/apache/seatunnel/issues/10560#issuecomment-4002457588

   <!-- code-pr-reviewer -->
   Thanks for the detailed analysis—this looks like a valid bug.
   
   **Root Cause**
   Two failure points:
   
   1. **Primary** (`IMapCheckpointIDCounter.java:47–55`):  
      `RetryUtils.retryWithException` calls 
`ExceptionUtil.isOperationNeedRetryException`, which only checks 
`HazelcastInstanceNotActiveException`, `InterruptedException`, and 
`OperationTimeoutException` (`ExceptionUtil.java:154-159`).  
      `RetryableHazelcastException` is **missing** from this check, so 
SeaTunnel's 30-retry window is skipped entirely.
   
   2. **Secondary** (`CoordinatorService.java:507`):  
      `runningJobStateIMap.get(jobId)` in `restoreJobFromMasterActiveSwitch` 
has no retry protection.
   
   **Suggested Fix**
   Add `RetryableHazelcastException` to 
`ExceptionUtil.isOperationNeedRetryException` and consider wrapping the IMap 
get operation in `restoreJobFromMasterActiveSwitch` with retry logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to