[
https://issues.apache.org/jira/browse/AIRFLOW-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062699#comment-17062699
]
Felipe Lolas commented on AIRFLOW-3863:
---------------------------------------
Yes!
Basically modified spark_submit_hook.py and added ssh_con_id arg. When it's
defined, the process handle everything remotely like moving resources(files)
and pooling logs.
I can make a PR in the weekend... meanwhile you can check the code here:
https://gist.github.com/flolas/2f745270a37cb3c748d4fe9aa8b08214
> Make SparkSubmitHook capable of executing spark-submit through SSH Connection
> -----------------------------------------------------------------------------
>
> Key: AIRFLOW-3863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3863
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks, operators
> Reporter: Felipe Lolas
> Assignee: Felipe Lolas
> Priority: Trivial
>
> Hi!
> I want to add a functionality in SparkSubmitHook; connect to a remote server
> though SSH and execute spark-submit. This would be great when working with
> multiple spark clusters with an edge node for each cluster and installing a
> airflow's worker onto edge node is not possibly.
> Im currently implementing, but I wanna hear some thoughts from the airflow
> community about the solution and if should be commited or not!
> Cheers!
> Felipe
--
This message was sent by Atlassian Jira
(v8.3.4#803005)