[ 
https://issues.apache.org/jira/browse/AIRFLOW-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albertus Kelvin reassigned AIRFLOW-6212:
----------------------------------------

    Assignee:     (was: Albertus Kelvin)

> SparkSubmitHook failed to execute spark-submit to standalone cluster
> --------------------------------------------------------------------
>
>                 Key: AIRFLOW-6212
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6212
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks, operators
>    Affects Versions: 1.10.6
>            Reporter: Albertus Kelvin
>            Priority: Trivial
>
> I was trying to submit a pyspark job with spark-submit using 
> SparkSubmitOperator. I already set up the master appropriately via 
> environment variable (AIRFLOW_CONN_SPARK_DEFAULT). The value was something 
> like *spark://host:port*.
> However, an exception occurred: 
> {noformat}
> airflow.exceptions.AirflowException: Cannot execute: ['path/to/spark-submit', 
> '--master', 'host:port', 'job.py']
> {noformat}
> Turns out that the master should have *spark://* preceding the host:port. I 
> checked the code and found that this wasn't handled.
> {code:python}
> conn = self.get_connection(self._conn_id)
> if conn.port:
>          conn_data['master'] = "{}:{}".format(conn.host, conn.port)
> else:
>          conn_data['master'] = conn.host
> {code}
> I think the protocol should be added like the following.
> {code:python}
> conn_data['master'] = "spark://{}:{}".format(conn.host, conn.port)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to