xujiangfeng001 opened a new issue, #3240:
URL: https://github.com/apache/incubator-streampark/issues/3240

   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   This issue is a derivative of 
https://github.com/apache/incubator-streampark/issues/2944 and primarily 
discusses the optimization of the auto-probe job process. The diagram below 
illustrates the entire job auto-probe process.
   
   
![自动探活功能流程图](https://github.com/apache/incubator-streampark/assets/104614523/4a322374-84b6-4f21-861c-98ac8e6c0c9c)
   
   In the optimization of the entire job auto-probe process, we particularly 
focused on the following aspects:
   
   1. We added a manual job probe button on the Streampark front-end page to 
prevent jobs from escaping the probe process after multiple failed attempts, 
ensuring that they can be relaunched by the job probe monitoring system.
   2. When jobs run remote、YARN session or K8s session, to ensure consistency 
between job and the deployed cluster's states, we have introduced the following 
logic: 
        a. If a job is successfully probe and in the running state, and its 
associated cluster is in a LOST state, we update the cluster's status to 
running.
        b. If after successfully probe jobs under a cluster, it is found that 
no jobs are in the running state, manual triggering of the cluster's probe is 
required to update the cluster's status.
   3. During a round of probe, we define the end-of-round criteria as follows: 
We consider the current probe round to be completed if there are no jobs in a 
LOST state or if the jobs in a LOST state have reached the maximum probe retry 
count. At this point, we notify the user of the probe round's statistical 
results.
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to