DarkAssassinator commented on issue #11652:
URL: 
https://github.com/apache/dolphinscheduler/issues/11652#issuecomment-1250034498

   > These are my humble opinions. If you have any questions, please let me 
know. @DarkAssassinator
   
   Hi @SbloodyS thank u so mach for ur suggestions. 
   > I think it's better not to use userName/passWord in ssh since there are 
some security risks. Using pam file or authrized_key in ssh is a more secure 
way. 
   
   u are right from a security point of view, but this increases the cost of 
use for the user, because user need create and download many host authrized_key 
files. May we can add password/authrized_key options in the UI that user can 
select the ssh policy.
   
   
   > In common usage scenarios, the masterServer/apiServer's node usually does 
not contain the permission to use HDFS and S3. These permissions are usually 
included in the workerServer's node. It requires these permissions on user's 
masterServer/apiServer's node if using scp command to trasfer the files to the 
task server. In addition, downloading files from the masterServer/apiServer's 
node and then scping them to the task node will waste network IO and hard disk 
IO for some large files or large number of small files.
   
   Sure, but not all tasks are suitable for running on remote servers. May just 
Shell/Python/JAVA.
   If the task need depend on the env or cluster services, this task will not 
add this setting. And about I/O, I think this part of the overhead users can 
perceive and accept, or we can add a I/O monitor, if I/O busy we can reject the 
command.
   
   > Using SSH to execute shell commands usually requires escaping a lot of 
special characters for different task type. And I think this is a huge workload 
for subsequent maintenance.
   
   ssh is same as the run the command at the local machine. because we just 
need send a ssh execute command to remote server same as local, because all 
detail command are saved in the other script. So we do not need have a big 
change. Just scp all tmp files to remote and run the main scipt. 
   
   > Using SSH means that the task running status and running logs need to be 
monitored by the masterServer. This may lead to high load on masterServer's 
node when the number of tasks is quite large.
   no need it, because worker will monitor the inputstream and errorstream, and 
print to worker logs, so this part no need any change.
   
   For this case, i think that we just need do the following changes:
   1. Add a ssh model, and add a UI management.
   2. Add a SSH selection in the Shell/Python task setting page.
   3. Add the ssh information to the task instance and context. 
   4. If ssh != null, shell/python will scp all tmp file to the ssh server and 
run the execute command. And get all run result and print  into the logs. And 
about  stop/kill/timeout/failover handled by the shell task itself.
   
   
   
   > Using SSH to execute shell commands usually requires escaping a lot of 
special characters for different task type. And I think this is a huge workload 
for subsequent maintenance.
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to