baolsen commented on issue #10874:
URL: https://github.com/apache/airflow/issues/10874#issuecomment-882286124


   Thank you all for the feedback.
   
   My use case is to split and zip some files on a remote server if those files 
are above a specific size.
   
   To do this, I do the following over SSH:
   - list the input files and check their size 
   - zip and split the largest ones using a Python script I send over SSH
   - Check the output by iterating the output files
   
   I put them in a single task using CustomSSHOperator which inherits 
SSHOperator. I want my task to be:
   
   - Able to resume if the process fails on file N
   - Simplify the support process so we only re-run 1 task if it fails
   - Avoid performance issues with 1.10.x Airflow scheduler if we had many tasks
   
   In my environment connections are expensive (for local and remote), complex 
and error prone.
   This is mainly because of the corporate authentication implementation at my 
site.
   
   But SSHOperator does not allow multiple commands to be run over 1 connection 
even when subclassed.
   I had to find a way to run multiple commands over the same connection.
   
   I could have subclassed SSHOperator and copied all of the "boiler plate" 
client code.
   That code is complex and doesn't directly add value to my use case. I don't 
want to maintain and update it myself.
   
   So I ended up for each command, changing the self.command and calling 
super.execute() as it was simplest.
   I agree it is not the right way to solve this just because it was convenient 
for me ;)
   
   One option could be to refactor SSHOperator to isolate the "run an SSH 
command" part which can be reused by subclasses.
   
   But we seem not decided yet whether the hook is a better place to do this. 
If so I think the changes could be more extensive and impactful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to