simmonn opened a new issue, #15851: URL: https://github.com/apache/dolphinscheduler/issues/15851
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened Version:3.2.0 jvm options: -Xmx3g -Xms3g -Xmn1g jdk:amazon-corretto-11.0.19.7.1-linux-x86_64 My project has 60 RemoteShell scheduled tasks(executing php commands). After running for a while, there are frequent Full GC occurrences, causing all tasks to fail, leading to false deadlocks on the worker nodes.So I had to change remoteshell to shell task which command using ssh -i id_rsa ''. Apart from some error logs, I also noticed WARN logs with NPE (NullPointerException) occurring every time a task is executed. `[WARN] 2024-04-10 04:01:27.782 +0800 org.apache.sshd.client.session.ClientSessionImpl:[618] - [WorkflowInstance-0][TaskInstance-0] - exceptionCaught(ClientSessionImpl[root@/172.19.23.121:22])[state=Opened] NullPointerException: No customized heartbeat handler registered` here is error log: `[ERROR] 2024-04-10 04:01:01.146 +0800 org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable:[181] - [WorkflowInstance-72475][TaskInstance-74145] - Task execute failed, due to meet an exception org.apache.dolphinscheduler.plugin.task.api.TaskException: Execute shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:110) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerDelayTaskExecuteRunnable.executeTask(DefaultWorkerDelayTaskExecuteRunnable.java:57) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.run(WorkerTaskExecuteRunnable.java:175) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: Remote shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:101) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:104) ... 9 common frames omitted Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: SSH connection failed at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:83) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.runRemote(RemoteExecutor.java:208) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskPid(RemoteExecutor.java:184) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:91) ... 10 common frames omitted Caused by: org.apache.sshd.common.SshException: DefaultConnectFuture[root@/172.19.23.121:22]: Failed to get operation result within specified timeout: 5000 at org.apache.sshd.common.future.AbstractSshFuture.formatExceptionMessage(AbstractSshFuture.java:185) at org.apache.sshd.common.future.AbstractSshFuture.verifyResult(AbstractSshFuture.java:111) at org.apache.sshd.client.future.DefaultConnectFuture.verify(DefaultConnectFuture.java:42) at org.apache.sshd.client.future.DefaultConnectFuture.verify(DefaultConnectFuture.java:34) at org.apache.dolphinscheduler.plugin.datasource.ssh.SSHUtils.getSession(SSHUtils.java:42) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:78) ... 13 common frames omitted [INFO] 2024-04-10 04:01:02.874 +0800 org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask:[118] - [WorkflowInstance-72475][TaskInstance-74145] - kill remote task dolphinscheduler-remoteshell-74145 [ERROR] 2024-04-10 04:01:02.875 +0800 org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable:[140] - [WorkflowInstance-72475][TaskInstance-74145] - Cancel task failed, this will not affect the taskInstance status, but you need to check manual org.apache.dolphinscheduler.plugin.task.api.TaskException: cancel application error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:121) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.cancelTask(WorkerTaskExecuteRunnable.java:136) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.afterThrowing(WorkerTaskExecuteRunnable.java:118) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerDelayTaskExecuteRunnable.afterThrowing(DefaultWorkerDelayTaskExecuteRunnable.java:67) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.run(WorkerTaskExecuteRunnable.java:182) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: SSH connection failed at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:83) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.runRemote(RemoteExecutor.java:208) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskPid(RemoteExecutor.java:184) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.kill(RemoteExecutor.java:176) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:119) ... 11 common frames omitted Caused by: java.lang.IllegalStateException: SshClient not started. Please call start() method before connecting to a server at org.apache.sshd.client.SshClient.doConnect(SshClient.java:627) at org.apache.sshd.client.SshClient.doConnect(SshClient.java:616) at org.apache.sshd.client.SshClient.connect(SshClient.java:547) at org.apache.sshd.client.SshClient.connect(SshClient.java:539) at org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:74) at org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:57) at org.apache.dolphinscheduler.plugin.datasource.ssh.SSHUtils.getSession(SSHUtils.java:41) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:78) ... 15 common frames omitted` here is the snapshot of host's memory:  ### What you expected to happen execute remoteshell tasks and has no memory leaks ### How to reproduce create remoteshell task,and schedules them in a short time ### Anything else _No response_ ### Version 3.2.x ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
