dybyte commented on PR #10075:
URL: https://github.com/apache/seatunnel/pull/10075#issuecomment-3637302136

   > I have a question about the behavior when a job is already stuck in the 
DOING_SAVEPOINT state. In that case, can stopPipelineWithCheckpointFallback 
always successfully stop the job and release the slot resources, or are there 
still situations where the job may remain stuck in DOING_SAVEPOINT?
   > 
   > ```
   >    ``` if 
(jobMaster.getCheckpointManager().isCompletedPipeline(pipelineId)) {
   >         forcePipelineFinish();
   >         }```
   > ```
   > 
   > Conceptually, what we wanted here is a “force pause” of the job. But in 
the current implementation, the force option seems to force end the job (eg set 
it to CANCELED) instead of pausing it. From your point of view, does a forced 
termination really count as a “pause”?
   > 
   > @dybyte
   
   From my understanding, the main purpose of this feature is to forcefully 
terminate a job that is stuck in an certain state, so that it does not continue 
holding slot resources indefinitely.
   For that reason, the implementation focuses on ending the job rather than 
pausing it.
   
   Regarding the job being stuck in the `DOING_SAVEPOINT` state, the reporter 
did not provide detailed logs, so it’s difficult to identify the exact root 
cause. My assumption is that it may be due to an issue during the 
savepoint-writing process, or not receiving the termination signal correctly.
   Except for extreme cases such as deadlocks, I believe the current logic 
should be able to successfully terminate the job and release its slot resources.
   
   Please let me know if there is anything I might have overlooked. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to