Re: [I] [Bug] [seatunnel] cluster emit OperationTimeoutException，crush！ [seatunnel]

via GitHub Wed, 28 Jan 2026 22:39:03 -0800


DanielCarter-stack commented on issue #10409:
URL: https://github.com/apache/seatunnel/issues/10409#issuecomment-3815781132


   <!-- code-pr-reviewer -->
   Thanks for reporting. Based on the stack traces and code, this appears to be 
two related stability issues under high load:
   
   1. **OperationTimeoutException**: `GetJobStatusOperation` blocks on 
`future.get()` without timeout in `GetJobStatusOperation.run()`. When the 
`get_job_status_operation` executor saturates or `getJobStatus()` is slow 
(accessing JobMaster/IMap in `CoordinatorService.getJobStatus()`), it exceeds 
the 25s Hazelcast call-timeout.
   
   2. **NullPointerException**: 
`CheckpointCoordinator.completePendingCheckpoint()` at line 913 calls 
`completedCheckpoint.getCheckpointType()`, suggesting `completedCheckpoint` is 
null. The code already has a null check in the `whenCompleteAsync` callback 
(lines 653-656 in `CheckpointCoordinator`), so this may indicate a concurrent 
modification issue or version mismatch.
   
   **To help us reproduce/fix**, could you provide:
   - Cluster size and number of concurrent jobs?
   - Your `hazelcast.yaml` configuration (especially `operation.call-timeout`)?
   - Whether this occurs during checkpoint triggers or specific REST API calls?
   - Full version details for `2.3.12.hb-SNAPSHOT`?
   
   **Workaround**: Consider increasing `hazelcast.operation.call-timeout` 
(default 25000ms) in your configuration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] [seatunnel] cluster emit OperationTimeoutException，crush！ [seatunnel]

Reply via email to