DarkAssassinator commented on issue #11652: URL: https://github.com/apache/dolphinscheduler/issues/11652#issuecomment-1250034498
> These are my humble opinions. If you have any questions, please let me know. @DarkAssassinator Hi @SbloodyS thank u so mach for ur suggestions. > I think it's better not to use userName/passWord in ssh since there are some security risks. Using pam file or authrized_key in ssh is a more secure way. u are right from a security point of view, but this increases the cost of use for the user, because user need create and download many host authrized_key files. May we can add password/authrized_key options in the UI that user can select the ssh policy. > In common usage scenarios, the masterServer/apiServer's node usually does not contain the permission to use HDFS and S3. These permissions are usually included in the workerServer's node. It requires these permissions on user's masterServer/apiServer's node if using scp command to trasfer the files to the task server. In addition, downloading files from the masterServer/apiServer's node and then scping them to the task node will waste network IO and hard disk IO for some large files or large number of small files. Sure, but not all tasks are suitable for running on remote servers. May just Shell/Python/JAVA. If the task need depend on the env or cluster services, this task will not add this setting. And about I/O, I think this part of the overhead users can perceive and accept, or we can add a I/O monitor, if I/O busy we can reject the command. > Using SSH to execute shell commands usually requires escaping a lot of special characters for different task type. And I think this is a huge workload for subsequent maintenance. ssh is same as the run the command at the local machine. because we just need send a ssh execute command to remote server same as local, because all detail command are saved in the other script. So we do not need have a big change. Just scp all tmp files to remote and run the main scipt. > Using SSH means that the task running status and running logs need to be monitored by the masterServer. This may lead to high load on masterServer's node when the number of tasks is quite large. no need it, because worker will monitor the inputstream and errorstream, and print to worker logs, so this part no need any change. For this case, i think that we just need do the following changes: 1. Add a ssh model, and add a UI management. 2. Add a SSH selection in the Shell/Python task setting page. 3. Add the ssh information to the task instance and context. 4. If ssh != null, shell/python will scp all tmp file to the ssh server and run the execute command. And get all run result and print into the logs. And about stop/kill/timeout/failover handled by the shell task itself. > Using SSH to execute shell commands usually requires escaping a lot of special characters for different task type. And I think this is a huge workload for subsequent maintenance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
