github-actions[bot] commented on issue #17316:
URL: 
https://github.com/apache/dolphinscheduler/issues/17316#issuecomment-3031274444

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   The latest version was deployed using dolphinscheduler user;
   Create a new shell task type, select the default defalut tenant to submit 
and run, and then Kill the task on the interface;
   It was found that the task process on the Worker machine was not really 
knocked out, but the task instance and workflow instance were shown to be 
normal knocked out.
   
   ### What you expected to happen
   
   After the workflow is removed on the UI interface, if the task instance has 
been displayed to be normal, the relevant task processes on the Worker machine 
should be removed normally.
   
   Code file location: 
dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/utils/ProcessUtils.java
   The method of bugs is:
   `
   public static boolean kill(@NonNull TaskExecutionContext request) {
           try {
               log.info("Begin killing task instance, processId: {}", 
request.getProcessId());
               int processId = request.getProcessId();
               if (processId == 0) {
                   log.info("Task instance has already finished, no need to 
kill");
                   return true;
               }
   
               // Get all child processes
               String pids = getPidsStr(processId);
               String[] pidArray = pids.split("\\s+");
               if (pidArray.length == 0) {
                   log.warn("No valid PIDs found for process: {}", processId);
                   return true;
               }
   
               // 1. Try to terminate gracefully (SIGINT)
               boolean gracefulKillSuccess = sendKillSignal("SIGINT", pids, 
request.getTenantCode());
               if (gracefulKillSuccess) {
                   log.info("Successfully killed process tree using SIGINT, 
processId: {}", processId);
                   return true;
               }
   
               // 2. Try to terminate forcefully (SIGTERM)
               boolean termKillSuccess = sendKillSignal("SIGTERM", pids, 
request.getTenantCode());
               if (termKillSuccess) {
                   log.info("Successfully killed process tree using SIGTERM, 
processId: {}", processId);
                   return true;
               }
   
               // 3. As a last resort, use `kill -9`
               log.warn("SIGINT & SIGTERM failed, using SIGKILL as a last 
resort for processId: {}", processId);
               boolean forceKillSuccess = sendKillSignal("SIGKILL", pids, 
request.getTenantCode());
               if (forceKillSuccess) {
                   log.info("Successfully sent SIGKILL signal to process tree, 
processId: {}", processId);
               } else {
                   log.error("Error sending SIGKILL signal to process tree, 
processId: {}", processId);
               }
               return forceKillSuccess;
   
           } catch (Exception e) {
               log.error("Kill task instance error, processId: {}", 
request.getProcessId(), e);
               return false;
           }
       }
   `
   The sendKillSignal method only sends a signal to the Kill process, and does 
not guarantee that the underlying operating system has actually lost the 
process. It is necessary to add logic to check whether the process survives 
after sending the Kill signal.
   
   ### How to reproduce
   
   1. The shell task script is as follows:
   `
   echo ${JAVE_HOME};
   sleep 10m
   `
   
   2. Kill workflow on the UI interface. After a while, the task instance and 
workflow instance are displayed as have been Killed. The log of the task 
instance is as follows:
   `
   Executing shell command : sudo -u dolphinscheduler -i 
/data01/dolphinscheduler/exec/process/87/87.sh
   process start, process id is: 3502853
   ......
   Begin killing task instance, processId: 3502853
   prepare to parse pid, raw pid string: 
sudo(3502853)---87.sh(3502868)----sleep(3502945)
   Sending SIGINT to process group: 3502853 3502868 3502945, command: sudo -u 
dolphinscheduler kill -s SIGINT 3502853 3502868 3502945
   Successfully killed process tree using SIGINT, processId: 3502853
   Process tree for task: 87 is killed or already finished, pid: 3502853
   `
   
   3, however, using the pstree command on the Worker machine to view the task 
process still exists and is not Killed
   `
   $ pstree -p 3502853
   sudo(3502853)───87.sh(3502868)───sleep(3502945)
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.3.0-alpha
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to