Sephiroth1024 opened a new issue, #9589:
URL: https://github.com/apache/seatunnel/issues/9589

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   We assume that there are 2 master nodes, m1 and m2, and the active master is 
m1. 
   When m1 restarts, m2 will become the active master and restore all the 
running jobs.
   m2 will restore a running job as following steps:
   1. Change the job state from RUNNING to PENDING.
   CoordinatorService#restoreJobFromMasterActiveSwitch
   <img width="1768" height="298" alt="Image" 
src="https://github.com/user-attachments/assets/317855f6-d05e-4a35-b6d3-edc14ba7dd08";
 />
   
   2. Apply slots.
   CoordinatorService#pendingJobSchedule
   <img width="1896" height="98" alt="Image" 
src="https://github.com/user-attachments/assets/95e5c2d9-fb43-45de-a9a9-6591bdd9ca98";
 />
   
   3. Run the job.
   3.1 CoordinatorService#pendingJobSchedule
   <img width="1892" height="400" alt="Image" 
src="https://github.com/user-attachments/assets/243b8196-8aa0-4557-817a-15092548d7d8";
 />
   
   3.2 PhysicalPlan#stateProcess
   <img width="1802" height="974" alt="Image" 
src="https://github.com/user-attachments/assets/87b92245-1b88-413e-97ec-47f9a3afe654";
 />
   We will execute the code in the red box because of the step 1 (job state is 
PENDING).
   But the code in the green box will return false because we only change the 
job state in step 1 and the pipeline state is still RUNNING instead of CREATED.
   This will result in the job not being deployed to the worker (i think it 
does not need to because the old job is still running on the worker) and the 
slots we applied in step 2 not being released.
   
   
   After I print some logs
   <img width="2260" height="596" alt="Image" 
src="https://github.com/user-attachments/assets/f4171e4f-936c-4868-adec-80c0d4ec7e84";
 />
   it shows as below
   <img width="2156" height="894" alt="Image" 
src="https://github.com/user-attachments/assets/bdb0ef51-0ad2-4f69-8029-a3f674a31c9a";
 />
   
   ### SeaTunnel Version
   
   2.3.11
   
   ### SeaTunnel Config
   
   ```conf
   seatunnel:
     engine:
       slot-service:
         dynamic-slot: false
         slot-num: 16
   ```
   
   ### Running Command
   
   ```shell
   //
   ```
   
   ### Error Exception
   
   ```log
   This will result in NoEnoughResourceException finally.
   ```
   
   ### Zeta or Flink or Spark Version
   
   Zeta
   
   ### Java or Scala Version
   
   JDK 1.8
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to