[ 
https://issues.apache.org/jira/browse/AIRFLOW-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062699#comment-17062699
 ] 

Felipe Lolas commented on AIRFLOW-3863:
---------------------------------------

Yes!

Basically modified spark_submit_hook.py and added ssh_con_id arg. When it's 
defined, the process handle everything remotely like moving resources(files) 
and pooling logs.

 

I can make a PR in the weekend... meanwhile you can check the code here: 
https://gist.github.com/flolas/2f745270a37cb3c748d4fe9aa8b08214

> Make SparkSubmitHook capable of executing spark-submit through SSH Connection
> -----------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3863
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3863
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks, operators
>            Reporter: Felipe Lolas
>            Assignee: Felipe Lolas
>            Priority: Trivial
>
> Hi!
> I want to add a functionality in SparkSubmitHook; connect to a remote server 
> though SSH and execute spark-submit. This would be great when working with 
> multiple spark clusters with an edge node for each cluster and installing a 
> airflow's worker onto edge node is not possibly.
> Im currently implementing, but I wanna hear some thoughts from the airflow 
> community about the solution and if should be commited or not!
> Cheers!
> Felipe



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to