[
https://issues.apache.org/jira/browse/AIRFLOW-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xifeng reassigned AIRFLOW-6212:
-------------------------------
Assignee: xifeng
> SparkSubmitHook failed to execute spark-submit to standalone cluster
> --------------------------------------------------------------------
>
> Key: AIRFLOW-6212
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6212
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks, operators
> Affects Versions: 1.10.6
> Reporter: Albertus Kelvin
> Assignee: xifeng
> Priority: Trivial
>
> I was trying to submit a pyspark job with spark-submit using
> SparkSubmitOperator. I already set up the master appropriately via
> environment variable (AIRFLOW_CONN_SPARK_DEFAULT). The value was something
> like *spark://host:port*.
> However, an exception occurred:
> {noformat}
> airflow.exceptions.AirflowException: Cannot execute: ['path/to/spark-submit',
> '--master', 'host:port', 'job.py']
> {noformat}
> Turns out that the master should have *spark://* preceding the host:port. I
> checked the code and found that this wasn't handled.
> {code:python}
> conn = self.get_connection(self._conn_id)
> if conn.port:
> conn_data['master'] = "{}:{}".format(conn.host, conn.port)
> else:
> conn_data['master'] = conn.host
> {code}
> I think the protocol should be added like the following.
> {code:python}
> conn_data['master'] = "spark://{}:{}".format(conn.host, conn.port)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)