simmonn opened a new issue, #15851:
URL: https://github.com/apache/dolphinscheduler/issues/15851

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   Version:3.2.0
   jvm options: -Xmx3g -Xms3g -Xmn1g
   jdk:amazon-corretto-11.0.19.7.1-linux-x86_64
   My project has 60 RemoteShell scheduled tasks(executing php commands). After 
running for a while, there are frequent Full GC occurrences, causing all tasks 
to fail, leading to false deadlocks on the worker nodes.So I had to change 
remoteshell to shell task which command using ssh -i id_rsa ''.
   
   Apart from some error logs, I also noticed WARN logs with NPE 
(NullPointerException) occurring every time a task is executed.
   `[WARN] 2024-04-10 04:01:27.782 +0800 
org.apache.sshd.client.session.ClientSessionImpl:[618] - 
[WorkflowInstance-0][TaskInstance-0] - 
exceptionCaught(ClientSessionImpl[root@/172.19.23.121:22])[state=Opened] 
NullPointerException: No customized heartbeat handler registered`
   
   here is error log:
   `[ERROR] 2024-04-10 04:01:01.146 +0800 
org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable:[181]
 - [WorkflowInstance-72475][TaskInstance-74145] - Task execute failed, due to 
meet an exception
   org.apache.dolphinscheduler.plugin.task.api.TaskException: Execute shell 
task error
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:110)
           at 
org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerDelayTaskExecuteRunnable.executeTask(DefaultWorkerDelayTaskExecuteRunnable.java:57)
           at 
org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.run(WorkerTaskExecuteRunnable.java:175)
           at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
           at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
           at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: Remote 
shell task error
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:101)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:104)
           ... 9 common frames omitted
   Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: SSH 
connection failed
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:83)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.runRemote(RemoteExecutor.java:208)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskPid(RemoteExecutor.java:184)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:91)
           ... 10 common frames omitted
   Caused by: org.apache.sshd.common.SshException: 
DefaultConnectFuture[root@/172.19.23.121:22]: Failed to get operation result 
within specified timeout: 5000
           at 
org.apache.sshd.common.future.AbstractSshFuture.formatExceptionMessage(AbstractSshFuture.java:185)
           at 
org.apache.sshd.common.future.AbstractSshFuture.verifyResult(AbstractSshFuture.java:111)
           at 
org.apache.sshd.client.future.DefaultConnectFuture.verify(DefaultConnectFuture.java:42)
           at 
org.apache.sshd.client.future.DefaultConnectFuture.verify(DefaultConnectFuture.java:34)
           at 
org.apache.dolphinscheduler.plugin.datasource.ssh.SSHUtils.getSession(SSHUtils.java:42)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:78)
           ... 13 common frames omitted
   [INFO] 2024-04-10 04:01:02.874 +0800 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask:[118] - 
[WorkflowInstance-72475][TaskInstance-74145] - kill remote task 
dolphinscheduler-remoteshell-74145
   [ERROR] 2024-04-10 04:01:02.875 +0800 
org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable:[140]
 - [WorkflowInstance-72475][TaskInstance-74145] - Cancel task failed, this will 
not affect the taskInstance status, but you need to check manual
   org.apache.dolphinscheduler.plugin.task.api.TaskException: cancel 
application error
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:121)
           at 
org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.cancelTask(WorkerTaskExecuteRunnable.java:136)
           at 
org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.afterThrowing(WorkerTaskExecuteRunnable.java:118)
           at 
org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerDelayTaskExecuteRunnable.afterThrowing(DefaultWorkerDelayTaskExecuteRunnable.java:67)
           at 
org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.run(WorkerTaskExecuteRunnable.java:182)
           at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
           at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
           at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: SSH 
connection failed
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:83)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.runRemote(RemoteExecutor.java:208)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskPid(RemoteExecutor.java:184)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.kill(RemoteExecutor.java:176)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:119)
           ... 11 common frames omitted
   Caused by: java.lang.IllegalStateException: SshClient not started. Please 
call start() method before connecting to a server
           at org.apache.sshd.client.SshClient.doConnect(SshClient.java:627)
           at org.apache.sshd.client.SshClient.doConnect(SshClient.java:616)
           at org.apache.sshd.client.SshClient.connect(SshClient.java:547)
           at org.apache.sshd.client.SshClient.connect(SshClient.java:539)
           at 
org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:74)
           at 
org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:57)
           at 
org.apache.dolphinscheduler.plugin.datasource.ssh.SSHUtils.getSession(SSHUtils.java:41)
           at 
org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:78)
           ... 15 common frames omitted`
   
   here is the snapshot of host's memory:
   
   
![image](https://github.com/apache/dolphinscheduler/assets/37128453/d1c1bce5-8bf9-4d8c-9615-0b913c803f78)
   
   
   ### What you expected to happen
   
   execute remoteshell tasks and has no memory leaks
   
   ### How to reproduce
   
   create remoteshell task,and schedules them in a short time
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.2.x
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to