Hi! Im currently working in adding SSH super-powers to SparkSubmitOperator. It’s really simple, using SshHook and a simple wrapper of the connection to mimic Popen interface. We are using internally in our company, because we have different secured spark clusters, with different software version and would be really difficult to manage with an installation of airflow worker in every cluster or installing spark-submit binary into airflow worker. I think this is a common problem.
I wanna know if someone want this kind of feature, if so, I can continue the work with tests and documentation and making a PR. Plus, hearing ideas, concerns, etc… of this approach. I will be happy to hear feedback from Airflow community. The WIP code is available in https://github.com/flolas/airflow/blob/5bc837a03d226718f78eecbf4c637de222280adc/airflow/contrib/hooks/spark_submit_hook.py <https://github.com/flolas/airflow/blob/5bc837a03d226718f78eecbf4c637de222280adc/airflow/contrib/hooks/spark_submit_hook.py> Cheers, Felipe L.
