[
https://issues.apache.org/jira/browse/AIRFLOW-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048411#comment-17048411
]
ASF GitHub Bot commented on AIRFLOW-6212:
-----------------------------------------
stale[bot] commented on pull request #7075: [AIRFLOW-6212] SparkSubmitHook
resolve connection
URL: https://github.com/apache/airflow/pull/7075
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> SparkSubmitHook failed to execute spark-submit to standalone cluster
> --------------------------------------------------------------------
>
> Key: AIRFLOW-6212
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6212
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks, operators
> Affects Versions: 1.10.6
> Reporter: Albertus Kelvin
> Assignee: xifeng
> Priority: Trivial
>
> I was trying to submit a pyspark job with spark-submit using
> SparkSubmitOperator. I already set up the master appropriately via
> environment variable (AIRFLOW_CONN_SPARK_DEFAULT). The value was something
> like *spark://host:port*.
> However, an exception occurred:
> {noformat}
> airflow.exceptions.AirflowException: Cannot execute: ['path/to/spark-submit',
> '--master', 'host:port', 'job.py']
> {noformat}
> Turns out that the master should have *spark://* preceding the host:port. I
> checked the code and found that this wasn't handled.
> {code:python}
> conn = self.get_connection(self._conn_id)
> if conn.port:
> conn_data['master'] = "{}:{}".format(conn.host, conn.port)
> else:
> conn_data['master'] = conn.host
> {code}
> I think the protocol should be added like the following.
> {code:python}
> conn_data['master'] = "spark://{}:{}".format(conn.host, conn.port)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)