[
https://issues.apache.org/jira/browse/AIRFLOW-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885708#comment-15885708
]
ASF subversion and git services commented on AIRFLOW-802:
---------------------------------------------------------
Commit 5831652f3fb1a6f296c8852513ed38427c897dd6 in incubator-airflow's branch
refs/heads/master from [~Fokko]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=5831652 ]
[AIRFLOW-802][AIRFLOW-1] Add spark-submit operator/hook
Add a operator for spark-submit to kick off Apache
Spark jobs by
using Airflow. This allows the user to maintain
the configuration
of the master and yarn queue within Airflow by
using connections.
Add default connection_id to the initdb routine to
set spark
to yarn by default. Add unit tests to verify the
behaviour of
the spark-submit operator and hook.
Closes #2042 from Fokko/airflow-802
> Integration of spark-submit
> ---------------------------
>
> Key: AIRFLOW-802
> URL: https://issues.apache.org/jira/browse/AIRFLOW-802
> Project: Apache Airflow
> Issue Type: New Feature
> Reporter: Fokko Driesprong
> Assignee: Fokko Driesprong
> Fix For: 1.9.0
>
>
> Hi all,
> I would like to add the spark-submit operator and hook. Right now we only
> support spark-sql operations by the SparkSqlOperator. Since spark-submit is a
> different binary, I've created a new hook and operator.
> Besides the fact that spark-submit has much more options to configure, there
> might be features in the future that share functionality with the spark-sql
> operator.
> One main implementation difference between the spark-sql and the spark-submit
> operator is the use of the connection_id of Airflow. It accepts a connection
> to set the master and the queue.
> Cheers!
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)