njnu-seafish opened a new issue, #17359:
URL: https://github.com/apache/dolphinscheduler/issues/17359

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   I try to kill spark job on yarn but failed.
   Logs show that "yarn: command not found"
   After fixing this, Logs show that kill yarn application failed with 
ExitCodeException. The exit code is 0, but errMsg is not null
   
   
---------------------------------------------------------------------------------------------
   there's the first logs:
   
---------------------------------------------------------------------------------------------
   2025-07-22 10:48:27.128 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-36] - kill 
cmd:sudo -u dolphinscheduler sh 
/data01/dolphinscheduler/exec/process/147/application_1749462877863_5866.kill
   2025-07-22 10:48:27.151 ERROR 
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-36] - Kill 
yarn application [[application_1749462877863_5866]] failed
   org.apache.dolphinscheduler.common.shell.AbstractShell$ExitCodeException: 
/data01/dolphinscheduler/exec/process/147/application_1749462877863_5866.kill: 
line 10: yarn: command not found
        at 
org.apache.dolphinscheduler.common.shell.AbstractShell.runCommand(AbstractShell.java:205)
        at 
org.apache.dolphinscheduler.common.shell.AbstractShell.run(AbstractShell.java:118)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execute(ShellExecutor.java:125)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execCommand(ShellExecutor.java:103)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execCommand(ShellExecutor.java:86)
        at 
org.apache.dolphinscheduler.common.utils.OSUtils.exeShell(OSUtils.java:342)
        at 
org.apache.dolphinscheduler.common.utils.OSUtils.exeCmd(OSUtils.java:331)
        at 
org.apache.dolphinscheduler.plugin.task.api.am.YarnApplicationManager.execYarnKillCommand(YarnApplicationManager.java:91)
        at 
org.apache.dolphinscheduler.plugin.task.api.am.YarnApplicationManager.killApplication(YarnApplicationManager.java:51)
        at 
org.apache.dolphinscheduler.plugin.task.api.am.YarnApplicationManager.killApplication(YarnApplicationManager.java:38)
        at 
org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils.cancelApplication(ProcessUtils.java:345)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.cancelApplication(AbstractCommandExecutor.java:226)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractYarnTask.cancelApplication(AbstractYarnTask.java:91)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractRemoteTask.cancel(AbstractRemoteTask.java:39)
        at 
org.apache.dolphinscheduler.server.worker.executor.PhysicalTaskExecutor.kill(PhysicalTaskExecutor.java:102)
        at 
org.apache.dolphinscheduler.task.executor.listener.TaskExecutorLifecycleEventListener.onTaskExecutorKillLifecycleEvent(TaskExecutorLifecycleEventListener.java:88)
        at 
org.apache.dolphinscheduler.task.executor.eventbus.TaskExecutorEventBusCoordinator.doFireTaskExecutorEventBus(TaskExecutorEventBusCoordinator.java:166)
        at 
org.apache.dolphinscheduler.task.executor.eventbus.TaskExecutorEventBusCoordinator.lambda$fireTaskExecutorEventBus$1(TaskExecutorEventBusCoordinator.java:123)
        at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
        at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1646)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   2025-07-22 10:48:27.151 ERROR 
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-36] - 
Cancel application failed: 
/data01/dolphinscheduler/exec/process/147/application_1749462877863_5866.kill: 
line 10: yarn: command not found
   
   
   
---------------------------------------------------------------------------------------------
   After fixing this, The second logs:
   
---------------------------------------------------------------------------------------------
   2025-07-22 14:45:15.928 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - 
Successfully killed process tree using SIGTERM, processId: 1219746
   2025-07-22 14:45:15.928 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - 
Process tree for task: 150 is killed or already finished, pid: 1219746
   2025-07-22 14:45:15.928 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - Get 
appIds from worker xxxxx:1234, taskLogPath: 
/data01/dolphinscheduler/20250722/145403649079392/5/103/150.log
   2025-07-22 14:45:15.928 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - 
Start finding appId in 
/data01/dolphinscheduler/20250722/145403649079392/5/103/150.log, fetch way: log 
   2025-07-22 14:45:15.929 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - Find 
appId: application_1749462877863_5903 from 
/data01/dolphinscheduler/20250722/145403649079392/5/103/150.log
   2025-07-22 14:45:15.930 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - get 
kerberos init command
   2025-07-22 14:45:15.930 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - 
kerberos init command: export KRB5_CONFIG=/etc/krb5.conf
   
   kinit -k -t /etc/security/keytabs/hdfs.keytab hdfs/xxxxx || true
   
   2025-07-22 14:45:15.930 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - kill 
cmd:sudo -u dolphinscheduler -i sh 
/data01/dolphinscheduler/exec/process/150/application_1749462877863_5903.kill
   2025-07-22 14:45:17.398 INFO  
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - 
exitCode: 0
   2025-07-22 14:45:17.399 ERROR 
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - Kill 
yarn application [[application_1749462877863_5903]] failed
   org.apache.dolphinscheduler.common.shell.AbstractShell$ExitCodeException: 
2025-07-22 14:45:17,383 | INFO | impl.YarnClientImpl | Killed application 
application_1749462877863_5903
        at 
org.apache.dolphinscheduler.common.shell.AbstractShell.runCommand(AbstractShell.java:206)
        at 
org.apache.dolphinscheduler.common.shell.AbstractShell.run(AbstractShell.java:118)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execute(ShellExecutor.java:125)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execCommand(ShellExecutor.java:103)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execCommand(ShellExecutor.java:86)
        at 
org.apache.dolphinscheduler.common.utils.OSUtils.exeShell(OSUtils.java:343)
        at 
org.apache.dolphinscheduler.common.utils.OSUtils.exeCmd(OSUtils.java:332)
        at 
org.apache.dolphinscheduler.plugin.task.api.am.YarnApplicationManager.execYarnKillCommand(YarnApplicationManager.java:91)
        at 
org.apache.dolphinscheduler.plugin.task.api.am.YarnApplicationManager.killApplication(YarnApplicationManager.java:51)
        at 
org.apache.dolphinscheduler.plugin.task.api.am.YarnApplicationManager.killApplication(YarnApplicationManager.java:38)
        at 
org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils.cancelApplication(ProcessUtils.java:345)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.cancelApplication(AbstractCommandExecutor.java:226)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractYarnTask.cancelApplication(AbstractYarnTask.java:91)
        at 
org.apache.dolphinscheduler.plugin.task.api.AbstractRemoteTask.cancel(AbstractRemoteTask.java:39)
        at 
org.apache.dolphinscheduler.server.worker.executor.PhysicalTaskExecutor.kill(PhysicalTaskExecutor.java:102)
        at 
org.apache.dolphinscheduler.task.executor.listener.TaskExecutorLifecycleEventListener.onTaskExecutorKillLifecycleEvent(TaskExecutorLifecycleEventListener.java:88)
        at 
org.apache.dolphinscheduler.task.executor.eventbus.TaskExecutorEventBusCoordinator.doFireTaskExecutorEventBus(TaskExecutorEventBusCoordinator.java:166)
        at 
org.apache.dolphinscheduler.task.executor.eventbus.TaskExecutorEventBusCoordinator.lambda$fireTaskExecutorEventBus$1(TaskExecutorEventBusCoordinator.java:123)
        at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
        at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1646)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   2025-07-22 14:45:17.399 ERROR 
[PhysicalTaskExecutorEventBusCoordinator-eventbus-coordinator-worker-97] - 
Cancel application failed: 2025-07-22 14:45:17,383 | INFO | impl.YarnClientImpl 
| Killed application application_1749462877863_5903
   
   
   ### What you expected to happen
   
   dolphinscheduler terminate yarn application successfully.
   
   ### How to reproduce
   
   I don't know if it's because my environment is special, but I've been 
failing consistently on my end.
   Has anyone else encountered a similar problem?
   
   ### Anything else
   
   
---------------------------------------------------------------------------------------------
   For the first question, my test is as follows.
   
---------------------------------------------------------------------------------------------
   [root@xxxxx][~]
   # sudo -u dolphinscheduler yarn version
   sudo: yarn: command not found
   
   
   [root@xxxxx][~]
   # sudo -u dolphinscheduler -i yarn version
   Hadoop 3.3.3
   Source code repository Unknown -r Unknown
   Compiled by root on 2023-07-31T01:58Z
   Compiled with protoc 3.7.1
   From source with checksum 9437955990f3957351278654266784fc
   This command was run using 
/usr/local/hadoop-3.3.3_ccdp3.3.3_1.0.2/share/hadoop/common/hadoop-common-3.3.3.jar
   
   
   [root@xxxxx][~]
   # su - dolphinscheduler
   
   [dolphinscheduler@xxxxx][~]
   $ yarn version
   Hadoop 3.3.3
   Source code repository Unknown -r Unknown
   Compiled by root on 2023-07-31T01:58Z
   Compiled with protoc 3.7.1
   From source with checksum 9437955990f3957351278654266784fc
   This command was run using 
/usr/local/hadoop-3.3.3_ccdp3.3.3_1.0.2/share/hadoop/common/hadoop-common-3.3.3.jar
   
   
   
   
---------------------------------------------------------------------------------------------
   For the second question, my test is as follows.
   
---------------------------------------------------------------------------------------------
   [root@xxxxx][/usr/local/dolphinscheduler]
   # yarn application -kill application_1749462877863_5866
   Killing application application_1749462877863_5866
   2025-07-22 14:03:59,361 | INFO | impl.YarnClientImpl | Killed application 
application_1749462877863_5866
   
   [root@xxxxx][/usr/local/dolphinscheduler]
   # echo $?
   
   
   ### Version
   
   3.3.0-alpha
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to