github-actions[bot] commented on issue #17316: URL: https://github.com/apache/dolphinscheduler/issues/17316#issuecomment-3031274444
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened The latest version was deployed using dolphinscheduler user; Create a new shell task type, select the default defalut tenant to submit and run, and then Kill the task on the interface; It was found that the task process on the Worker machine was not really knocked out, but the task instance and workflow instance were shown to be normal knocked out. ### What you expected to happen After the workflow is removed on the UI interface, if the task instance has been displayed to be normal, the relevant task processes on the Worker machine should be removed normally. Code file location: dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/utils/ProcessUtils.java The method of bugs is: ` public static boolean kill(@NonNull TaskExecutionContext request) { try { log.info("Begin killing task instance, processId: {}", request.getProcessId()); int processId = request.getProcessId(); if (processId == 0) { log.info("Task instance has already finished, no need to kill"); return true; } // Get all child processes String pids = getPidsStr(processId); String[] pidArray = pids.split("\\s+"); if (pidArray.length == 0) { log.warn("No valid PIDs found for process: {}", processId); return true; } // 1. Try to terminate gracefully (SIGINT) boolean gracefulKillSuccess = sendKillSignal("SIGINT", pids, request.getTenantCode()); if (gracefulKillSuccess) { log.info("Successfully killed process tree using SIGINT, processId: {}", processId); return true; } // 2. Try to terminate forcefully (SIGTERM) boolean termKillSuccess = sendKillSignal("SIGTERM", pids, request.getTenantCode()); if (termKillSuccess) { log.info("Successfully killed process tree using SIGTERM, processId: {}", processId); return true; } // 3. As a last resort, use `kill -9` log.warn("SIGINT & SIGTERM failed, using SIGKILL as a last resort for processId: {}", processId); boolean forceKillSuccess = sendKillSignal("SIGKILL", pids, request.getTenantCode()); if (forceKillSuccess) { log.info("Successfully sent SIGKILL signal to process tree, processId: {}", processId); } else { log.error("Error sending SIGKILL signal to process tree, processId: {}", processId); } return forceKillSuccess; } catch (Exception e) { log.error("Kill task instance error, processId: {}", request.getProcessId(), e); return false; } } ` The sendKillSignal method only sends a signal to the Kill process, and does not guarantee that the underlying operating system has actually lost the process. It is necessary to add logic to check whether the process survives after sending the Kill signal. ### How to reproduce 1. The shell task script is as follows: ` echo ${JAVE_HOME}; sleep 10m ` 2. Kill workflow on the UI interface. After a while, the task instance and workflow instance are displayed as have been Killed. The log of the task instance is as follows: ` Executing shell command : sudo -u dolphinscheduler -i /data01/dolphinscheduler/exec/process/87/87.sh process start, process id is: 3502853 ...... Begin killing task instance, processId: 3502853 prepare to parse pid, raw pid string: sudo(3502853)---87.sh(3502868)----sleep(3502945) Sending SIGINT to process group: 3502853 3502868 3502945, command: sudo -u dolphinscheduler kill -s SIGINT 3502853 3502868 3502945 Successfully killed process tree using SIGINT, processId: 3502853 Process tree for task: 87 is killed or already finished, pid: 3502853 ` 3, however, using the pstree command on the Worker machine to view the task process still exists and is not Killed ` $ pstree -p 3502853 sudo(3502853)───87.sh(3502868)───sleep(3502945) ### Anything else _No response_ ### Version 3.3.0-alpha ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
