njnu-seafish opened a new issue, #17436:
URL: https://github.com/apache/dolphinscheduler/issues/17436

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   ### 1. Create a shell task and configure the timeout failure strategy
   
   <img width="1006" height="837" alt="Image" 
src="https://github.com/user-attachments/assets/d5b67942-6d3f-4687-8da8-3b541a268c47";
 />
   
   
   
   ### 2. Manually kill the task, and the logs show kill success operation. 
(**Only call the cancelApplication method once.**)
   
   2025-08-15 13:49:33.105 INFO  [WorkerRpcServer-methodInvoker-224] - Publish 
TaskExecutorKillLifecycleEvent: {
     "taskInstanceId" : 1081,
     "eventCreateTime" : 1755236973105,
     "type" : "KILL"
   }
   2025-08-15 13:49:33.147 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
Begin killing task instance, processId: 749659
   2025-08-15 13:49:33.449 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
prepare to parse pid, raw pid string: 
sudo(749659)---1081.sh(749674)---sleep(749748)
   
   2025-08-15 13:49:34.003 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
Sending SIGINT to process group: 749659 749674 749748, command: sudo -u 
dolphinscheduler -i kill -s SIGINT 749659 749674 749748
   2025-08-15 13:49:44.992 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Kill 
command: sudo -u dolphinscheduler -i kill -s SIGINT 749659 749674 749748, timed 
out, still running PIDs: 749659 749674 749748
   2025-08-15 13:49:45.545 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
Sending SIGTERM to process group: 749659 749674 749748, command: sudo -u 
dolphinscheduler -i kill -s SIGTERM 749659 749674 749748
   2025-08-15 13:49:46.253 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
Successfully killed process tree using SIGTERM, processId: 749659
   2025-08-15 13:49:46.254 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
Process tree for task: 1081 is killed or already finished, pid: 749659
   2025-08-15 13:49:46.254 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - Get 
appIds from worker 192.168.30.121:1234, taskLogPath: 
/data01/dolphinscheduler/20250815/149143631011392/1/1015/1081.log
   2025-08-15 13:49:46.254 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
Start finding appId in 
/data01/dolphinscheduler/20250815/149143631011392/1/1015/1081.log, fetch way: 
log 
   2025-08-15 13:49:46.254 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - The 
appId is empty
   2025-08-15 13:49:46.254 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-87] - 
Success fire TaskExecutorKillLifecycleEvent: {
     "taskInstanceId" : 1081,
     "eventCreateTime" : 1755236973105,
     "type" : "KILL"
   } 
   2025-08-15 13:49:46.360 INFO  [exclusive-task-executor-container-worker-0] - 
process has exited. execute path:/data01/dolphinscheduler/exec/process/1081, 
processId:749659 ,exitStatusCode:143 ,processWaitForStatus:true 
,processExitValue:143
   
   
   ### 3, However, an exception was thrown when killing due to timeout. (**The 
cancelApplication method was called twice.**)
   
   
   2025-08-15 16:55:37.289 INFO  [WorkerRpcServer-methodInvoker-31] - Publish 
TaskExecutorKillLifecycleEvent: {
     "taskInstanceId" : 1084,
     "eventCreateTime" : 1755248137289,
     "type" : "KILL"
   }
   2025-08-15 16:55:37.333 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - Begin 
killing task instance, processId: 837363
   2025-08-15 16:55:37.730 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - 
prepare to parse pid, raw pid string: 
sudo(837363)---1084.sh(837379)---sleep(837453)
   
   2025-08-15 16:55:38.316 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - 
Sending SIGINT to process group: 837363 837379 837453, command: sudo -u 
dolphinscheduler -i kill -s SIGINT 837363 837379 837453
   2025-08-15 16:55:49.325 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - Kill 
command: sudo -u dolphinscheduler -i kill -s SIGINT 837363 837379 837453, timed 
out, still running PIDs: 837363 837379 837453
   2025-08-15 16:55:49.876 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-0] - 
Sending SIGTERM to process group: 837363 837379 837453, command: sudo -u 
dolphinscheduler -i kill -s SIGTERM 837363 837379 837453
   2025-08-15 16:55:50.166 ERROR [exclusive-task-executor-container-worker-0] - 
process has failure, the task timeout configuration value is:60, ready to kill 
...
   2025-08-15 16:55:50.167 INFO  [exclusive-task-executor-container-worker-0] - 
Begin killing task instance, processId: 837363
   2025-08-15 16:55:50.566 INFO  [exclusive-task-executor-container-worker-0] - 
prepare to parse pid, raw pid string: 
   2025-08-15 16:55:50.567 ERROR [exclusive-task-executor-container-worker-0] - 
Kill task instance error, processId: 837363
   java.lang.NumberFormatException: For input string: ""
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:592)
        at java.lang.Integer.parseInt(Integer.java:615)
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
        at 
org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils.kill(ProcessUtils.java:124)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.cancelApplication(AbstractCommandExecutor.java:216)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.run(AbstractCommandExecutor.java:196)
        at 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask.handle(ShellTask.java:85)
        at 
org.apache.dolphinscheduler.server.worker.executor.PhysicalTaskExecutor.doTriggerTaskPlugin(PhysicalTaskExecutor.java:74)
        at 
org.apache.dolphinscheduler.task.executor.AbstractTaskExecutor.start(AbstractTaskExecutor.java:80)
        at 
org.apache.dolphinscheduler.task.executor.worker.TaskExecutorWorker.start(TaskExecutorWorker.java:65)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   2025-08-15 16:55:50.567 ERROR [exclusive-task-executor-container-worker-0] - 
Failed to kill process tree for task: 1084, pid: 837363
   2025-08-15 16:55:50.567 INFO  [exclusive-task-executor-container-worker-0] - 
Get appIds from worker 192.168.30.121:1234, taskLogPath: 
/data01/dolphinscheduler/20250815/149143631011392/1/1018/1084.log
   2025-08-15 16:55:50.567 INFO  [exclusive-task-executor-container-worker-0] - 
Start finding appId in 
/data01/dolphinscheduler/20250815/149143631011392/1/1018/1084.log, fetch way: 
log 
   2025-08-15 16:55:50.567 INFO  [exclusive-task-executor-container-worker-0] - 
The appId is empty
   2025-08-15 16:55:50.568 INFO  [exclusive-task-executor-container-worker-0] - 
process has exited. execute path:/data01/dolphinscheduler/exec/process/1084, 
processId:837363 ,exitStatusCode:-1 ,processWaitForStatus:false 
,processExitValue:143
   2025-08-15 16:55:50.568 INFO  [exclusive-task-executor-container-worker-0] - 
Publish TaskExecutorFailedLifecycleEvent: {
     "taskInstanceId" : 1084,
     "eventCreateTime" : 1755248150568,
     "type" : "FAILED",
     "workflowInstanceId" : 1018,
     "workflowInstanceHost" : "192.168.30.11:5678",
     "taskInstanceHost" : "192.168.30.121:1234",
     "appIds" : "",
     "endTime" : 1755248150568,
     "latestReportTime" : null
   }
   
   ### What you expected to happen
   
   1, Task timeout kill don't throw exception
   2, It's best to trigger the kill action only once.
   
   ### How to reproduce
   
   1. Create a shell task and configure the timeout failure strategy
   
   <img width="1006" height="837" alt="Image" 
src="https://github.com/user-attachments/assets/d5b67942-6d3f-4687-8da8-3b541a268c47";
 />
   
   2. Run the workflow, wait to kill the task after timeout 
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to